Nucleotide sequence of the haemophilus influenza Rd genome, fragments thereof, and uses thereof

ABSTRACT

The present invention provides the sequencing of the entire genome of  Haemophilus influenzae  Rd, SEQ ID NO: 1. The present invention further provides the sequence information stored on computer readable media, and computer-based systems and methods which facilitate its use. In addition to the entire genomic sequence, the present invention identifies over 1700 protein encoding fragments of the genome and identifies, by position relative to a unique NotI restriction endonuclease site, any regulatory elements which modulate the expression of the protein encoding fragments of the Haemophilus genome.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a divisional of and claims priority under 35U.S.C. § 120 to U.S. application Ser. No. 09/643,990, filed Aug. 23,2000, which is a continuation of and claims priority under 35 U.S.C. §120 to U.S. application Ser. No. 08/487,429, filed Jun. 7, 1995, whichis a continuation-in-part of and claims priority under 35 U.S.C. § 120to U.S. application Ser. No. 08/426,787, filed Apr. 21, 1995, which ishereby incorporated by reference in its entirety.

STATEMENTS AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY-SPONSOREDRESEARCH AND DEVELOPMENT

[0002] Part of the work performed during development of this inventionutilized U.S. Government funds. The government may have certain rightsin this invention. NIH-5R01GM48251

REFERENCE TO A SEQUENCE LISTING PROVIDED ON COMPACT DISC

[0003] This application refers to a “Sequence Listing”, which isprovided as an electronic document on two identical compact discs(CD-R), labeled “Copy 1” and “Copy 2.” These compact discs each containthe electronic document, filename “PB186P1C1D1.ST25.txt” (2,357,775bytes in size, created on Dec. 24, 2002), which is hereby incorporatedin its entirety herein.

FIELD OF THE INVENTION

[0004] The present invention relates to the field of molecular biology.The present invention discloses compositions comprising the nucleotidesequence of Haemophilus influenzae, fragments thereof and usage inindustrial fermentation and pharmaceutical development.

BACKGROUND OF THE INVENTION

[0005] The complete genome sequence from a free living cellular organismhas never been determined. The first mycobacterium sequence should becompleted by 1996, while E. coli and S. cerevisae are expected to becompleted before 1998. These are being done by random and/or directedsequencing of overlapping cosmid clones. No one has attempted todetermine sequences of the order of a megabase or more by a randomshotgun approach.

[0006]H. influenzae is a small (approximately 0.4×1 micron) non-motile,non-spore forming, germ-negative bacterium whose only natural host ishuman. It is a resident of the upper respiratory mucosa of children andadults and causes otitis media and respiratory tract infections mostlyin children. The most serious complication is meningitis, which producesneurological sequelae in up to 50% of affected children. Six H.influenzae serotypes (a through f) have been identified based onimmunologically distinct capsular polysaccharide antigens. A number ofnon-typeable strains are also known. Serotype b accounts for themajority of human disease.

[0007] Interest in the medically important aspects of H. influenzaebiology has focused particularly on those genes which determinevirulence characteristics of the organism. A number of the genesresponsible for the capsular polysaccharide have been mapped andsequenced (Kroll et al., Mol. Microbiol. 5(6):1549-1560 (1991)). Severalouter membrane protein (OMP) genes have been identified and sequenced(Langford et al., J. Gen. Microbiol. 138:155-159 (1992)). Thelipoligosaccharide (LOS) component of the outer membrane and the genesof its synthetic pathway are under intensive study (Weiser et al., J.Bacteriol. 172:3304-3309 (1990)). While a vaccine has been availablesince 1984, the study of outer membrane components is motivated to someextent by the need for improved vaccines. Recently, the catalase genewas characterized and sequenced as a possible virulence-related gene(Bishni et al., in press). Elucidation of the H. influenzae genome willenhance the understanding of how H. influenzae causes invasive diseaseand how best to combat infection.

[0008]H. influenzae possesses a highly efficient natural DNAtransformation system which has been intensively studied in thenon-encapsulated (R), serotype d strain (Kahn and Smith, J. MembraneBiology 81:89-103 (1984)). At least 16 transformation-specific geneshave been identified and sequenced. Of these, four are regulatory(Redfield, J. Bacteriol. 173:5612-5618 (1991), and Chandler, Proc. Natl.Acad. Sci. USA 89:1626-1630 (1992)), at least two are involved inrecombination processes (Barouki and Smith, J. Bacteriol. 163(2):629-634(1985)), and at least seven are targeted to the membranes andperiplasmic space (Tomb et al., Gene 104:1-10 (1991), and Tomb, Proc.Natl. Acad. Sci. USA 89:10252-10256 (1992)), where they appear tofunction as structural components or in the assembly of the DNAtransport machinery. H. influenzae Rd transformation shows a number ofinteresting features including sequence-specific DNA uptake, rapiduptake of several double-stranded DNA molecules per competent cell intoa membrane compartment called the transformasome, linear translocationof a single strand of the donor DNA into the cytoplasm, and synapsis andrecombination of the strand with the chromosome by a single-stranddisplacement mechanism. The H. influenzae Rd transformation system isthe most thoroughly studied of the gram-negative systems and distinct ina number of ways from the gram-positive systems.

[0009] The size of H. influenzae Rd genome has been determined bypulsed-field agarose gel electrophoresis of restriction digests to beapproximately 1.9 Mb, making its genome approximately 40% the size of E.coli (Lee and Smith, J. Bacteriol. 170:4402-4405 (1988)). Therestriction map of H. influenzae is circular (Lee et al., J. Bacteriol.171:3016-3024 (1989), and Redfield and Lee, “Haemophilus influenzae Rd”,pp. 2110-2112, In O'Brien, S. J. (ed), Genetic Maps: Locus Maps ofComplex Genomes, Cold Spring Harbor Press, New York). Various genes havebeen mapped to restriction fragments by Southern hybridization probingof restriction digest DNA bands. This map will be valuable inverification of the assembly of a complete genome sequence from randomlysequenced fragments. GenBank currently contains about 100 kb ofnon-redundant H. influenzae DNA sequences. About half are from serotypeb and half from Rd.

SUMMARY OF THE INVENTION

[0010] The present invention is based on the sequencing of theHaemophilus influenzae Rd genome. The primary nucleotide which wasgenerated is provided in SEQ ID NO: 1.

[0011] The present invention provides the generated nucleotide sequenceof the Haemophilus influenzae Rd genome, or a representative fragmentthereof, in a form which can be readily used, analyzed, and interpretedby a skilled artisan. In one embodiment, present is provided as acontiguous string of primary sequence information corresponding to thenucleotide sequence depicted in SEQ ID NO: 1.

[0012] The present invention further provides nucleotide sequences whichare at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1.

[0013] The nucleotide sequence of SEQ ID NO: 1, a representativefragment thereof, or a nucleotide sequence which is at least 99.9%identical to the nucleotide sequence of SEQ ID NO: 1 may be provided ina variety of mediums to facilitate its use. In one application of thisembodiment, the sequences of the present invention are recorded oncomputer readable media. Such media includes, but is not limited to:magnetic storage media, such as floppy discs, hard disc storage medium,and magnetic tape; optical storage media such as CD-ROM; electricalstorage media such as RAM and ROM; and hybrids of these categories suchas magnetic/optical storage media.

[0014] The present invention further provides systems, particularlycomputer-based systems which contain the sequence information hereindescribed stored in a data storage means. Such systems are designed toidentify commercially important fragments of the Haemophilus influenzaeRd genome.

[0015] Another embodiment of the present invention is directed toisolated fragments of the Haemophilus influenzae Rd genome. Thefragments of the Haemophilus influenzae Rd genome of the presentinvention include, but are not limited to, fragments which encodepeptides, hereinafter open reading frames (ORFs), fragments whichmodulate the expression of an operably linked ORF, hereinafterexpression modulating fragments (EMFs), fragments which mediate theuptake of a linked DNA fragment into a cell, hereinafter uptakemodulating fragments (UMFs), and fragments which can be used to diagnosethe presence of Haemophilus influenzae Rd in a sample, hereinafter,diagnostic fragments (DFs).

[0016] Each of the ORF fragments of the Haemophilus influenzae Rd genomedisclosed in Tables 1(a) and 2, and the EMF found 5′ to the ORF, can beused in numerous ways as polynucleotide reagents. The sequences can beused as diagnostic probes or diagnostic amplification primers for thepresence of a specific microbe in a sample, for the production ofcommercially important pharmaceutical agents, and to selectively controlgene expression.

[0017] The present invention further includes recombinant constructscomprising one or more fragments of the Haemophilus influenzae Rd genomeof the present invention. The recombinant constructs of the presentinvention comprise vectors, such as a plasmid or viral vector, intowhich a fragment of the Haemophilus influenzae Rd has been inserted.

[0018] The present invention further provides host cells containing anyone of the isolated fragments of the Haemophilus influenzae Rd genome ofthe present invention. The host cells can be a higher eukaryotic hostsuch as a mammalian cell, a lower eukaryotic cell such as a yeast cell,or can be a procaryotic cell such as a bacterial cell.

[0019] The present invention is further directed to isolated proteinsencoded by the ORFs of the present invention. A variety of methodologiesknown in the art can be utilized to obtain any one of the proteins ofthe present invention. At the simplest level, the amino acid sequencecan be synthesized using commercially available peptide synthesizers. Inan alternative method, the protein is purified from bacterial cellswhich naturally produce the protein. Lastly, the proteins of the presentinvention can alternatively be purified from cells which have beenaltered to express the desired protein.

[0020] The invention further provides methods of obtaining homologs ofthe fragments of the Haemophilus influenzae Rd genome of the presentinvention and homologs of the proteins encoded by the ORFs of thepresent invention. Specifically, by using the nucleotide and amino acidsequences disclosed herein as a probe or as primers, and techniques suchas PCR cloning and colony/plaque hybridization, one skilled in the artcan obtain homologs.

[0021] The invention further provides antibodies which selectively bindone of the proteins of the present invention. Such antibodies includeboth monoclonal and polyclonal antibodies.

[0022] The invention further provides hybridomas which produce theabove-described antibodies. A hybridoma is an immortalized cell linewhich is capable of secreting a specific monoclonal antibody.

[0023] The present invention further provides methods of identifyingtest samples derived from cells which express one of the ORF of thepresent invention, or homolog thereof. Such methods comprise incubatinga test sample with one or more of the antibodies of the presentinvention, or one or more of the DFs of the present invention, underconditions which allow a skilled artisan to determine if the samplecontains the ORF or product produced therefrom.

[0024] In another embodiment of the present invention, kits are providedwhich contain the necessary reagents to carry out the above-describedassays.

[0025] Specifically, the invention provides a compartmentalized kit toreceive, in close confinement, one or more containers which comprises:(a) a first container comprising one of the antibodies, or one of theDFs of the present invention; and (b) one or more other containerscomprising one or more of the following: wash reagents, reagents capableof detecting presence of bound antibodies or hybridized DFs.

[0026] Using the isolated proteins of the present invention, the presentinvention further provides methods of obtaining and identifying agentscapable of binding to a protein encoded by one of the ORFs of thepresent invention. Specifically, such agents include antibodies(described above), peptides, carbohydrates, pharmaceutical agents andthe like. Such methods comprise the steps of:

[0027] (a) contacting an agent with an isolated protein encoded by oneof the ORFs of the present invention; and

[0028] (b) determining whether the agent binds to said protein.

[0029] The complete genomic sequence of H. influenzae will be of greatvalue to all laboratories working with this organism and for a varietyof commercial purposes. Many fragments of the Haemophilus influenzae Rdgenome will be immediately identified by similarity searches againstGenBank or protein databases and will be of immediate value toHaemophilus researchers and for immediate commercial value for theproduction of proteins or to control gene expression. A specific exampleconcerns PHA synthase. It has been reported that polyhydroxybutyrate ispresent in the membranes of H. influenzae Rd and that the amountcorrelates with the level of competence for transformation. The PHAsynthase that synthesizes this polymer has been identified and sequencedin a number of bacteria, none of which are evolutionarily close to H.influenzae. This gene has yet to be isolated from H. influenzae by useof hybridization probes or PCR techniques. However, the genomic sequenceof the present invention allows the identification of the gene byutilizing search means described below.

[0030] Developing the methodology and technology for elucidating theentire genomic sequence of bacterial and other small genomes has andwill greatly enhance the ability to analyze and understand chromosomalorganization. In particular, sequenced genomes will provide the modelsfor developing tools for the analysis of chromosome structure andfunction, including the ability to identify genes within large segmentsof genomic DNA, the structure, position, and spacing of regulatoryelements, the identification of genes with potential industrialapplications, and the ability to do comparative genomic and molecularphylogeny.

BRIEF DESCRIPTION OF THE FIGURES

[0031]FIG. 1—restriction map of the Haemophilus influenzae Rd genome.

[0032]FIG. 2—Block diagram of a computer system 102 that can be used toimplement the computer-based systems of present invention.

[0033]FIG. 3—A comparison of experimental coverage of up toapproximately 4000 random sequence fragments assembled withAutoAssembler (squares) as compared to Lander-Waterman prediction for a2.5 Mb genome (triangles) and a 1.6 Mb genome (circles) with a 460 bpaverage sequence length and a 25 bp overlap.

[0034]FIG. 4—Data flow and computer programs used to manage, assemble,edit, and annotate the H. influenzae genome. Both Macintosh and Unixplatforms are used to handle the AB 373 sequence data files (Kerlavageet al., Proceedings of the Twenty-Sixth Annual Hawaii InternationalConference on System Sciences, IEEE Computer Society Press, WashingtonD.C., 585 (1993)). Factura (AB) is a Macintosh program designed forautomatic vector sequence removal and end trimming of sequence files.The program esp runs on a Macintosh platform and parses the feature dataextracted from the sequence files by Factura to the Unix based H.influenzae relational database. Assembly is accomplished by retrieving aspecific set of sequence files and their associated features using stp,an X-windows graphical interface and control program which can retrievesequences from the H. influenzae database using user-defined or standardSQL queries. The sequence files were assembled using TIGR Assembler, anassembly engine designed at TIGR for rapid and accurate assembly ofthousands of sequence fragments. TIGR Editor is a graphical interfacewhich can parse the aligned sequence files from TIGR Assembler outputand display the alignment and associated electropherograms for contigediting. Identification of putative coding regions was performed withGenemark (Borodovsky and McIninch, Computers Chem. 17(2):123 (1993)), aMarkov and Bayes modeled program for predicting gene locations, andtrained on a H. influenzae sequence data set. Peptide searches wereperformed against the three reading frames of each Genemark predictedcoding region using blaze (Brutlag et al., Computers Chem. 17:203(1993)) run on a Maspar MP-2 massively parallel computer with 4096microprocessors. Results from each frame were combined into a singleoutput file by mblzt. Optimal protein alignments were obtained using theprogram praze which extends alignments across potential frameshifts. Theoutput was inspected using a custom graphic viewing program, gbyob, thatinteracts directly with the H. influenzae database. The alignments werefurther used to identify potential frameshift errors and were targetedfor additional editing.

[0035]FIG. 5—A circular representation of the H. influenzae Rdchromosome illustrating the location of each predicted coding regioncontaining a database match as well as selected global features of thegenome. Outer perimeter: The location of the unique NotI restrictionsite (designated as nucleotide 1), the RsrII sites, and the SmaI sites.Outer concentric circle: The location of each identified coding regionfor which a gene identification was made. Second concentric circle:Regions of high G/C content and high A/T content. High G/C contentregions are specifically associated with the 6 ribosomal operons and themu-like prophage. Third concentric circle: Coverage by lambda clones.Over 300 lambda clones were sequenced from each end to confirm theoverall structure of the genome and identify the 6 ribosomal operons.Fourth concentric circle: The locations of the 6 ribosomal operons, thetRNAs and the cryptic mu-like prophage. Fifth concentric circle: Simpletandem repeats. The locations of the following repeats are shown:CTGGCT, GTCT, ATT, AATGGC, TTGA, TTGG, TTTA, TTATC, TGAC, TCGTC, AACC,TTGC, CAAT, CCAA. The putative origin of replication is illustrated bythe outward pointing arrows originating near base 603,000. Two potentialtermination sequences are shown near the opposite midpoint of thecircle.

[0036] FIGS. 6(A)-6(AN)—Complete map of the H. influenzae Rd genome.Predicted coding regions are shown on each strand. rRNA and tRNA genesare shown as lines and triangles, respectively. GeneID numberscorrespond to those in Tables 1(a), 1(b) and 2. Where possible,three-letter designations are also provided.

[0037]FIG. 7—A comparison of the region of the H. influenzae chromosomecontaining the 8 genes of the fimbrial gene cluster present in H.influenzae type b and the same region in H. influenzae Rd. The region isflanked by the pepN and purE genes in both organisms. However in thenon-infectious Rd strain the 8 genes of the fimbrial gene cluster havebeen excised. A 172 bp spacer region is located in this region in the Rdstrain and continues to be flanked by the pepN and purE genes.

[0038]FIG. 8—Hydrophobicity analysis of five predicted channel-proteins.The amino acid sequences of five predicted coding regions that do notdisplay homology with known peptide sequences (GenBank release 87), eachexhibit multiple hydrophobic domains that are characteristic ofchannel-forming proteins. The predicted coding region sequences wereanalyzed by the Kyte-Doolittle algorithm (Kyte and Doolittle, J. Mol.Biol. 157:105 (1982)) (with a range of 11 residues) using the GeneWorkssoftware package (Intelligenetics).

DETAILED DESCRIPTION

[0039] The present invention is based on the sequencing of theHaemophilus influenzae Rd genome. The primary nucleotide sequence whichwas generated is provided in SEQ ID NO: 1. As used herein, the “primarysequence” refers to the nucleotide sequence represented by the IUPACnomenclature system.

[0040] The sequence provided in SEQ ID NO: 1 is oriented relative to aunique Not I restriction endonuclease site found in the Haemophilusinfluenzae Rd genome. A skilled artisan will readily recognize that thisstart/stop point was chosen for convenience and does not reflect astructural significance.

[0041] The present invention provides the nucleotide sequence of SEQ IDNO: 1, or a representative fragment thereof, in a form which can bereadily used, analyzed, and interpreted by a skilled artisan. In oneembodiment, the sequence is provided as a contiguous string of primarysequence information corresponding to the nucleotide sequence providedin SEQ ID NO: 1.

[0042] As used herein, a “representative fragment of the nucleotidesequence depicted in SEQ ID NO: 1” refers to any portion of SEQ ID NO: 1which is not presently represented within a publicly available database.Preferred representative fragments of the present invention areHaemophilus influenzae open reading frames, expression modulatingfragments, uptake modulating fragments, and fragments which can be usedto diagnose the presence of Haemophilus influenzae Rd in sample. Anon-limiting identification of such preferred representative fragmentsis provided in Tables 1 (a) and 2.

[0043] The nucleotide sequence information provided in SEQ ID NO: 1 wasobtained by sequencing the Haemophilus influenzae Rd genome using amegabase shotgun sequencing method. Using three parameters of accuracydiscussed in the Examples below, the present inventors have calculatedthat the sequence in SEQ ID NO: 1 has a maximum accuracy of 99.98%.Thus, the nucleotide sequence provided in SEQ ID NO: 1 is a highlyaccurate, although not necessarily a 100% perfect, representation of thenucleotide sequences of the Haemophilus influenzae Rd genome.

[0044] As discussed in detail below, using the information provided inSEQ ID NO: 1 and in Tables 1(a) and 2 together with routine cloning andsequencing methods, one of ordinary skill in the art will be able toclone and sequence all “representative fragments” of interest includingopen reading frames (ORFs) encoding a large variety of Haemophilusinfluenzae proteins. In very rare instances, this may reveal anucleotide sequence error present in the nucleotide sequence disclosedin SEQ ID NO: 1. Thus, once the present invention is made available(i.e., once the information in SEQ If) NO: 1 and Tables 1(a) and 2 havebeen made available), resolving a rare sequencing error in SEQ ID NO: 1will be well within the skill of the art. Nucleotide sequence editingsoftware is publicly available. For example, Applied Biosystems' (AB)AutoAssembler™ can be used as an aid during visual inspection ofnucleotide sequences.

[0045] Even if all of the very rare sequencing errors in SEQ ID NO: 1were corrected, the resulting nucleotide sequence would still be atleast 99.9% identical to the nucleotide sequence in SEQ ID NO: 1.

[0046] The nucleotide sequences of the genomes from different strains ofHaemophilus influenzae differ slightly. However, the nucleotide sequenceof the genomes of all Haemophilus influenzae strains will be at least99.9% identical to the nucleotide sequence provided in SEQ ID NO: 1.

[0047] Thus, the present invention further provides nucleotide sequenceswhich are at least 99.9% identical to the nucleotide sequence of SEQ IDNO: 1 in a form which can be readily used, analyzed and interpreted bythe skilled artisan. Methods for determining whether a nucleotidesequence is at least 99.9% identical to the nucleotide sequence of SEQID NO: 1 are routine and readily available to the skilled artisan. Forexample, the well known fasta algorithm (Pearson and Lipman, Proc. Natl.Acad. Sci. USA 85:2444 (1988)) can be used to generate the percentidentity of nucleotide sequences.

[0048] Computer Related Embodiments

[0049] The nucleotide sequence provided in SEQ ID NO: 1, arepresentative fragment thereof, or a nucleotide sequence at least 99.9%identical to SEQ ID NO: 1 may be “provided” in a variety of mediums tofacilitate use thereof. As used herein, provided refers to amanufacture, other than an isolated nucleic acid molecule, whichcontains a nucleotide sequence of the present invention, i.e., thenucleotide sequence provided in SEQ ID NO: 1, a representative fragmentthereof, or a nucleotide sequence at least 99.9% identical to SEQ IDNO: 1. Such a manufacture provides the Haemophilus influenzae Rd genomeor a subset thereof (e.g., a Haemophilus Influenzae Rd open readingframe (ORF)) in a form which allows a skilled artisan to examine themanufacture using means not directly applicable to examining theHaemophilus influenzae Rd genome or a subset thereof as it exists innature or in purified form.

[0050] In one application of this embodiment, a nucleotide sequence ofthe present invention can be recorded on computer readable media. Asused herein, “computer readable media” refers to any medium which can beread and accessed directly by a computer. Such media include, but arenot limited to: magnetic storage media, such as floppy discs, hard discstorage medium, and magnetic tape; optical storage media such as CD-ROM;electrical storage media such as RAM and ROM; and hybrids of thesecategories such as magnetic/optical storage media. A skilled artisan canreadily appreciate how any of the presently known computer readablemediums can be used to create a manufacture comprising computer readablemedium having recorded thereon a nucleotide sequence of the presentinvention.

[0051] As used herein, “recorded” refers to a process for storinginformation on computer readable medium. A skilled artisan can readilyadopt any of the presently know methods for recording information oncomputer readable medium to generate manufactures comprising thenucleotide sequence information of the present invention.

[0052] A variety of data storage structures are available to a skilledartisan for creating a computer readable medium having recorded thereona nucleotide sequence of the present invention. The choice of the datastorage structure will generally be based on the means chosen to accessthe stored information. In addition, a variety of data processorprograms and formats can be used to store the nucleotide sequenceinformation of the present invention on computer readable medium. Thesequence information can be represented in a word processing text file,formatted in commercially-available software such as WordPerfect andMicroSoft Word, or represented in the form of an ASCII file, stored in adatabase application, such as DB2, Sybase, Oracle, or the like. Askilled artisan can readily adapt any number of dataprocessorstructuring formats (e.g. text file or database) in order to obtaincomputer readable medium having recorded thereon the nucleotide sequenceinformation of the present invention.

[0053] By providing the nucleotide sequence SEQ ID NO: 1, arepresentative fragment thereof, or a nucleotide sequence at least 99.9%identical to SEQ ID NO: 1 in computer readable form, a skilled artisancan routinely access the sequence information for a variety of purposes.Computer software is publicly available which allows a skilled artisanto access sequence information provided in a computer readable medium.The examples which follow demonstrate how software which implements theBLAST (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) and BLAZE(Brutlag et al., Comp. Chem. 17:203-207 (1993)) search algorithms on aSybase system was used to identify open reading frames (ORFs) within theHaemophilus influenzae Rd genome which contain homology to ORFs orproteins from other organisms. Such ORFs are protein encoding fragmentswithin the Haemophilus influenzae Rd genome and are useful in producingcommercially important proteins such as enzymes used in fermentationreactions and in the production of commercially useful metabolites.

[0054] The present invention further provides systems, particularlycomputer-based systems, which contain the sequence information describedherein. Such systems are designed to identify commercially importantfragments of the Haemophilus influenzae Rd genome.

[0055] As used herein, “a computer-based system” refers to the hardwaremeans, software means, and data storage means used to analyze thenucleotide sequence information of the present invention. The minimumhardware means of the computer-based systems of the present inventioncomprises a central processing unit (CPU), input means, output means,and data storage means. A skilled artisan can readily appreciate thatany one of the currently available computer-based system are suitablefor use in the present invention.

[0056] As stated above, the computer-based systems of the presentinvention comprise a data storage means having stored therein anucleotide sequence of the present invention and the necessary hardwaremeans and software means for supporting and implementing a search means.As used herein, “data storage means” refers to memory which can storenucleotide sequence information of the present invention, or a memoryaccess means which can access manufactures having recorded thereon thenucleotide sequence information of the present invention.

[0057] As used herein, “search means” refers to one or more programswhich are implemented on the computer-based system to compare a targetsequence or target structural motif with the sequence information storedwithin the data storage means. Search means are used to identifyfragments or regions of the Haemophilus influenzae Rd genome which matcha particular target sequence or target motif. A variety of knownalgorithms are disclosed publicly and a variety of commerciallyavailable software for conducting search means are and can be used inthe computer-based systems of the present invention. Examples of suchsoftware includes, but is not limited to, MacPattern (EMBL), BLASTN andBLASTX (NCBIA). A skilled artisan can readily recognize that any one ofthe available algorithms or implementing software packages forconducting homology searches can be adapted for use in the presentcomputer-based systems.

[0058] As used herein, a “target sequence” can be any DNA or amino acidsequence of six or more nucleotides or two or more amino acids. Askilled artisan can readily recognize that the longer a target sequenceis, the less likely a target sequence will be present as a randomoccurrence in the database. The most preferred sequence length of atarget sequence is from about 10 to 100 amino acids or from about 30 to300 nucleotide residues. However, it is well recognized that searchesfor commercially important fragments of the Haemophilus influenzae Rdgenome, such as sequence fragments involved in gene expression andprotein processing, may be of shorter length.

[0059] As used herein, “a target structural motif,” or “target motif,”refers to any rationally selected sequence or combination of sequencesin which the sequence(s) are chosen based on a three-dimensionalconfiguration which is formed upon the folding of the target motif.There are a variety of target motifs known in the art. Protein targetmotifs include, but are not limited to, enzymatic active sites andsignal sequences. Nucleic acid target motifs include, but are notlimited to, promoter sequences, hairpin structures and inducibleexpression elements (protein binding sequences).

[0060] A variety of structural formats for the input and output meanscan be used to input and output the information in the computer-basedsystems of the present invention. A preferred format for an output meansranks fragments of the Haemophilus influenzae Rd genome possessingvarying degrees of homology to the target sequence or target motif. Suchpresentation provides a skilled artisan with a ranking of sequenceswhich contain various amounts of the target sequence or target motif andidentifies the degree of homology contained in the identified fragment.

[0061] A variety of comparing means can be used to compare a targetsequence or target motif with the data storage means to identifysequence fragments of the Haemophilus influenzae Rd genome. In thepresent examples, implementing software which implement the BLAST andBLAZE algorithms (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) wasused to identify open reading frames within the Haemophilus influenzaeRd genome. A skilled artisan can readily recognize that any one of thepublicly available homology search programs can be used as the searchmeans for the computer-based systems of the present invention.

[0062] One application of this embodiment is provided in FIG. 2. FIG. 2provides a block diagram of a computer system 102 that can be used toimplement the present invention. The computer system 102 includes aprocessor 106 connected to a bus 104. Also connected to the bus 104 area main memory 108 (preferably implemented as random access memory, RAM)and a variety of secondary storage devices 110, such as a hard drive 112and a removable medium storage device 114. The removable medium storagedevice 114 may represent, for example, a floppy disk drive, a CD-ROMdrive, a magnetic tape drive, etc. A removable storage medium 116 (suchas a floppy disk, a compact disk, a magnetic tape, etc.) containingcontrol logic and/or data recorded therein may be inserted into theremovable medium storage device 114. The computer system 102 includesappropriate software for reading the control logic and/or the data fromthe removable medium storage device 114 once inserted in the removablemedium storage device 114.

[0063] A nucleotide sequence of the present invention may be stored in awell known manner in the main memory 108, any of the secondary storagedevices 110, and/or a removable storage medium 116. Software foraccessing and processing the genomic sequence (such as search tools,comparing tools, etc.) reside in main memory 108 during execution.

[0064] Biochemical Embodiments

[0065] Another embodiment of the present invention is directed toisolated fragments of the Haemophilus influenzae Rd genome. Thefragments of the Haemophilus influenzae Rd genome of the presentinvention include, but are not limited to fragments which encodepeptides, hereinafter open reading frames (ORFs), fragments whichmodulate the expression of an operably linked ORF, hereinafterexpression modulating fragments (EMFs), fragments which mediate theuptake of a linked DNA fragment into a cell, hereinafter uptakemodulating fragments (UMFs), and fragments which can be used to diagnosethe presence of Haemophilus influenzae Rd in a sample, hereinafterdiagnostic fragments (DFs).

[0066] As used herein, an “isolated nucleic acid molecule” or an“isolated fragment of the Haemophilus influenzae Rd genome” refers to anucleic acid molecule possessing a specific nucleotide sequence whichhas been subjected to purification means to reduce, from thecomposition, the number of compounds which are normally associated withthe composition. A variety of purification means can be used togenerated the isolated fragments of the present invention. Theseinclude, but are not limited to methods which separate constituents of asolution based on charge, solubility, or size.

[0067] In one embodiment, Haemophilus influenaze Rd DNA can bemechanically sheared to produce fragments of 15-20 kb in length. Thesefragments can then be used to generate an Haemophilus influenzae Rdlibrary by inserting them into lambda clones as described in theExamples below. Primers flanking, for example, an ORF provided in Table1(a) can then be generated using nucleotide sequence informationprovided SEQ ID NO: 1. PCR cloning can then be used to isolate the ORFfrom the lambda DNA library. PCR cloning is well known in the art. Thus,given the availability of SEQ ID NO: 1, Table 1(a) and Table 2, it wouldbe routine to isolate any ORF or other nucleic acid fragment of thepresent invention.

[0068] The isolated nucleic acid molecules of the present inventioninclude, but are not limited to single stranded and double stranded DNA,and single stranded RNA.

[0069] As used herein, an “open reading frame,” ORF, means a series oftriplets coding for amino acids without any termination codons and is asequence translatable into protein. Tables 1a, 1b and 2 identify ORFs inthe Haemophilus influenzae Rd genome. In particular, Table 1(a)indicates the location of ORFs within the Haemophilus influenzae genomewhich encode the recited protein based on homology matching with proteinsequences from the organism appearing in parentheticals (see the fourthcolumn of Table 1(a)).

[0070] The first column of Table 1(a) provides the “GeneID” of aparticular ORF. This information is useful for two reasons. First, thecomplete map of the Haemophilus influenzae Rd genome provided in FIGS.6(A)-6(AN) refers to the ORFs according to their GeneID numbers. Second,Table 1(b) uses the GeneID numbers to indicate which ORFs were providedpreviously in a public database.

[0071] The second and third columns in Table 1(a) indicate an ORFsposition in the nucleotide sequence provided in SEQ ID NO: 1. One ofordinary skill will recognize that ORFs may be oriented in oppositedirections in the Haemophilus influenzae genome. This is reflected incolumns 2 and 3.

[0072] The fifth column of Table 1(a) indicates the percent identity ofthe protein encoded for by an ORF to the corresponding protein from theorganism appearing in parentheticals in the fourth column.

[0073] The sixth column of Table 1(a) indicates the percent similarityof the protein encoded for by an ORF to the corresponding protein fromthe organism appearing in parentheticals in the fourth column. Theconcepts of percent identity and percent similarity of two polypeptidesequences is well understood in the art. For example, two polypeptides10 amino acids in length which differ at three amino acid positions(e.g., at positions 1, 3 and 5) are said to have a percent identity of70%. However, the same two polypeptides would be deemed to have apercent similarity of 80% if, for example at position 5, the amino acidsmoieties, although not identical, were “similar” (i.e., possessedsimilar biochemical characteristics).

[0074] The seventh column in Table 1(a) indicates the length of theamino acid homology match.

[0075] Table 2 provides ORFs of the Haemophilus influenzae Rd genomewhich encode polypeptide sequences which did not elicit a “homologymatch” with a known protein sequence from another organism. Furtherdetails concerning the algorithms and criteria used for homologysearches are provided in the Examples below.

[0076] A skilled artisan can readily identify ORFs in the Haemophilusinfluenzae Rd genome other than those listed in Tables 1(a), 1(b) and 2,such as ORFs which are overlapping or encoded by the opposite strand ofan identified ORF in addition to those ascertainable using thecomputer-based systems of the present invention.

[0077] As used herein, an “expression modulating fragment,” EMF, means aseries of nucleotide molecules which modulates the expression of anoperably linked ORF or EMF.

[0078] As used herein, a sequence is said to “modulate the expression ofan operably linked sequence” when the expression of the sequence isaltered by the presence of the EMF. EMFs include, but are not limitedto, promoters, and promoter modulating sequences (inducible elements).One class of EMFs are fragments which induce the expression or anoperably linked ORF in response to a specific regulatory factor orphysiological event. A review of known EMFs from Haemophilus aredescribed by (Tomb et al. Gene 104:1-10 (1991), Chandler, M. S., Proc.Natl. Acad. Sci. USA 89:1626-1630 (1992).

[0079] EMF sequences can be identified within the Haemophilus influenzaeRd genome by their proximity to the ORFs provided in Tables 1(a), 1(b)and 2. An intergenic segment, or a fragment of the intergenic segment,from about 10 to 200 nucleotides in length, taken 5′ from any one of theORFs of Tables 1(a), 1(b), or 2 will modulate the expression of anoperably linked 3′ ORF in a fashion similar to that found with thenaturally linked ORF sequence. As used herein, an “intergenic segment”refers to the fragments of the Haemophilus genome which are between twoORF(s) herein described. Alternatively, EMFs can be identified usingknown EMFs as a target sequence or target motif in the computer-basedsystems of the present invention.

[0080] The presence and activity of an EMF can be confirmed using an EMFtrap vector. An EMF trap vector contains a cloning site 51 to a markersequence. A marker sequence encodes an identifiable phenotype, such asantibiotic resistance or a complementing nutrition auxotrophic factor,which can be identified or assayed when the EMF trap vector is placedwithin an appropriate host under appropriate conditions. As describedabove, a EMF will modulate the expression of an operably linked markersequence. A more detailed discussion of various marker sequences isprovided below.

[0081] A sequence which is suspected as being a EMF is cloned in allthree reading frames in one or more restriction sites upstream from themarker sequence in the EMF trap vector. The vector is then transformedinto an appropriate host using known procedures and the phenotype of thetransformed host in examined under appropriate conditions. As describedabove, an EMF will modulate the expression of an operably linked markersequence.

[0082] As used herein, an “uptake modulating fragment,” UMF, means aseries of nucleotide molecules which mediate the uptake of a linked DNAfragment into a cell. UMFs can be readily identified using known UMFs asa target sequence or target motif with the computer-based systemsdescribed above.

[0083] The presence and activity of a UMF can be confirmed by attachingthe suspected UMF to a marker sequence. The resulting nucleic acidmolecule is then incubated with an appropriate host under appropriateconditions and the uptake of the marker sequence is determined. Asdescribed above, a UMF will increase the frequency of uptake of a linkedmarker sequence. A review of DNA uptake in Haemophilus is provided byGoodgall, S. H., et al., J. Bact. 172:5924-5928 (1990).

[0084] As used herein, a “diagnostic fragment,” DF, means a series ofnucleotide molecules which selectively hybridize to Haemophilusinfluenzae sequences. DFs can be readily identified by identifyingunique sequences within the Haemophilus influenzae Rd genome, or bygenerating and testing probes or amplification primers consisting of theDF sequence in an appropriate diagnostic format which determinesamplification or hybridization selectivity.

[0085] The sequences falling within the scope of the present inventionare not limited to the specific sequences herein described, but alsoinclude allelic and species variations thereof. Allelic and speciesvariations can be routinely determined by comparing the sequenceprovided in SEQ ID NO: 1, a representative fragment thereof, or anucleotide sequence at least 99.9% identical to SEQ ID NO: 1 with asequence from another isolate of the same species. Furthermore, toaccommodate codon variability, the invention includes nucleic acidmolecules coding for the same amino acid sequences as do the specificORFs disclosed herein. In other words, in the coding region of an ORF,substitution of one codon for another which encodes the same amino acidis expressly contemplated.

[0086] Any specific sequence disclosed herein can be readily screenedfor errors by resequencing a particular fragment, such as an ORF, inboth directions (i.e., sequence both strands). Alternatively, errorscreening can be performed by sequencing corresponding polynucleotidesof Haemophilus influenzae origin isolated by using part or all of thefragments in question as a probe or primer.

[0087] Each of the ORFs of the Haemophilus influenzae Rd genomedisclosed in Tables 1(a), 1(b) and 2, and the EMF found 51 to the ORF,can be used in numerous ways as polynucleotide reagents. The sequencescan be used as diagnostic probes or diagnostic amplification primers todetect the presence of a specific microbe, such as Haemophilusinfluenzae RD, in a sample. This is especially the case with thefragments or ORFs of Table 2, which will be highly selective forHaemophilus influenzae.

[0088] In addition, the fragments of the present invention, as broadlydescribed, can be used to control gene expression through triple helixformation or antisense DNA or RNA, both of which methods are based onthe binding of a polynucleotide sequence to DNA or RNA. Polynucleotidessuitable for use in these methods are usually 20 to 40 bases in lengthand are designed to be complementary to a region of the gene involved intranscription (triple helix—see Lee et al., Nucl. Acids Res. 6:3073(1979); Cooney et al., Science 241:456 (1988); and Dervan et al.,Science 251:1360 (1991)) or to the mRNA itself (antisense—Okano, J.Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitorsof Gene Expression, CRC Press, Boca Raton, Fla. (1988)).

[0089] Triple helix- formation optimally results in a shut-off of RNAtranscription from DNA, while antisense RNA hybridization blockstranslation of an mRNA molecule into polypeptide. Both techniques havebeen demonstrated to be effective in model systems. Informationcontained in the sequences of the present invention is necessary for thedesign of an antisense or triple helix oligonucleotide.

[0090] The present invention further provides recombinant constructscomprising one or more fragments of the Haemophilus influenzae Rd genomeof the present invention. The recombinant constructs of the presentinvention comprise a vector, such as a plasmid or viral vector, intowhich a fragment of the Haemophilus influenzae Rd has been inserted, ina forward or reverse orientation. In the case of a vector comprising oneof the ORFs of the present invention, the vector may further compriseregulatory sequences, including for example, a promoter, operably linkedto the ORF. For vectors comprising the EMFs and UMFs of the presentinvention, the vector may further comprise a marker sequence orheterologous ORF operably linked to the EMF or UMF. Large numbers ofsuitable vectors and promoters are known to those of skill in the artand are commercially available for generating the recombinant constructsof the present invention. The following vectors are provided by way ofexample. Bacterial: pBs, phagescript, PsiX174, pBluescript SK, pBs KS,pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3,pDR540, pRIT5 (Pharmacia). Eukaryotic: pWLneo, pSV2cat, pOG44, pXT1, pSG(Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia).

[0091] Promoter regions can be selected from any desired gene using CAT(chloramphenicol transferase) vectors or other vectors with selectablemarkers. Two appropriate vectors are pKK232-8 and pCM7. Particular namedbacterial promoters include lac, lacZ, T3, T7, gpt, lambda P_(R), andtrc. Eukaryotic promoters include CMV immediate early, HSV thymidinekinase, early and late SV40, LTRs from retrovirus, and mousemetallothionein-I. Selection of the appropriate vector and promoter iswell within the level of ordinary skill in the art.

[0092] The present invention further provides host cells containing anyone of the isolated fragments of the Haemophilus influenzae Rd genome ofthe present invention, wherein the fragment has been introduced into thehost cell using known transformation methods. The host cell can be ahigher eukaryotic host cell, such as a mammalian cell, a lowereukaryotic host cell, such as a yeast cell, or the host cell can be aprocaryotic cell, such as a bacterial cell. Introduction of therecombinant construct into the host cell can be effected by calciumphosphate transfection, DEAE, dextran mediated transfection, orelectroporation (Davis, L. et al., Basic Methods in Molecular Biology(1986)).

[0093] The host cells containing one of the fragments of the Haemophilusinfluenzae Rd genome of the present invention, can be used inconventional manners to produce the gene product encoded by the isolatedfragment (in the case of an ORF) or can be used to produce aheterologous protein under the control of the EMF.

[0094] The present invention further provides isolated polypeptidesencoded by the nucleic acid fragments of the present invention or bydegenerate variants of the nucleic acid fragments of the presentinvention. By “degenerate variant” is intended nucleotide fragmentswhich differ from a nucleic acid fragment of the present invention(e.g., an ORF) by nucleotide sequence but, due to the degeneracy of theGenetic Code, encode an identical polypeptide sequence. Preferrednucleic acid fragments of the present invention are the ORFs depicted inTable 1(a) which encode proteins.

[0095] A variety of methodologies known in the art can be utilized toobtain any one of the isolated polypeptides or proteins of the presentinvention. At the simplest level, the amino acid sequence can besynthesized using commercially available peptide synthesizers. This isparticularly useful in producing small peptides and fragments of largerpolypeptides. Fragments are useful, for example, in generatingantibodies against the native polypeptide. In an alternative method, thepolypeptide or protein is purified from bacterial cells which naturallyproduce the polypeptide or protein. One skilled in the art can readilyfollow known methods for isolating polypeptides and proteins in order toobtain one of the isolated polypeptides or proteins of the presentinvention. These include, but are not limited to, immunochromatography,HPLC, size-exclusion chromatography, ion-exchange chromatography, andimmuno-affinity chromatography.

[0096] The polypeptides and proteins of the present invention canalternatively be purified from cells which have been altered to expressthe desired polypeptide or protein. As used herein, a cell is said to bealtered to express a desired polypeptide or protein when the cell,through genetic manipulation, is made to produce a polypeptide orprotein which it normally does not produce or which the cell normallyproduces at a lower level. One skilled in the art can readily adaptprocedures for introducing and expressing either recombinant orsynthetic sequences into eukaryotic or prokaryotic cells in order togenerate a cell which produces one of the polypeptides or proteins ofthe present invention.

[0097] Any host/vector system can be used to express one or more of theORFs of the present invention. These include, but are not limited to,eukaryotic hosts such as HeLa cells, Cv-1 cell, COS cells, and Sf9cells, as well as prokaryotic host such as E. coli and B. subtilis. Themost preferred cells are those which do not normally express theparticular polypeptide or protein or which expresses the polypeptide orprotein at low natural level.

[0098] “Recombinant,” as used herein, means that a polypeptide orprotein is derived from recombinant (e.g., microbial or mammalian)expression systems. “Microbial” refers to recombinant polypeptides orproteins made in bacterial or fungal (e.g., yeast) expression systems.As a product, “recombinant microbial” defines a polypeptide or proteinessentially free of native endogenous substances and unaccompanied byassociated native glycosylation. Polypeptides or proteins expressed inmost bacterial cultures, e.g., E. coli, will be free of glycosylationmodifications; polypeptides or proteins expressed in yeast will have aglycosylation pattern different from that expressed in mammalian cells.

[0099] “Nucleotide sequence” refers to a heteropolymer ofdeoxyribonucleotides. Generally, DNA segments encoding the polypeptidesand proteins provided by this invention are assembled from fragments ofthe Haemophilus influenzae Rd genome and short oligonucleotide linkers,or from a series of oligonucleotides, to provide a synthetic gene whichis capable of being expressed in a recombinant transcriptional unitcomprising regulatory elements derived from a microbial or viral operon.

[0100] “Recombinant expression vehicle or vector” refers to a plasmid orphage or virus or vector, for expressing a polypeptide from a DNA (RNA)sequence. The expression vehicle can comprise a transcriptional unitcomprising an assembly of (1) a genetic element or elements having aregulatory role in gene expression, for example, promoters or enhancers,(2) a structural or coding sequence which is transcribed into mRNA andtranslated into protein, and (3) appropriate transcription initiationand termination sequences. Structural units intended for use in yeast oreukaryotic expression systems preferably include a leader sequenceenabling extracellular secretion of translated protein by a host cell.Alternatively, where recombinant protein is expressed without a leaderor transport sequence, it may include an N-terminal methionine residue.This residue may or may not be subsequently cleaved from the expressedrecombinant protein to provide a final product.

[0101] “Recombinant expression system” means host cells which havestably integrated a recombinant transcriptional unit into chromosomalDNA or carry the recombinant transcriptional unit extra chromosomally.The cells can be prokaryotic or eukaryotic. Recombinant expressionsystems as defined herein will express heterologous polypeptides orproteins upon induction of the regulatory elements linked to the DNAsegment or synthetic gene to be expressed.

[0102] Mature proteins can be expressed in mammalian cells, yeast,bacteria, or other cells under the control of appropriate promoters.Cell-free translation systems can also be employed to produce suchproteins using RNAs derived from the DNA constructs of the presentinvention. Appropriate cloning and expression vectors for use withprokaryotic and eukaryotic hosts are described by Sambrook, et al., inMolecular Cloning: A Laboratory Manual, Second Edition, Cold SpringHarbor, N.Y. (1989), the disclosure of which is hereby incorporated byreference.

[0103] Generally, recombinant expression vectors will include origins ofreplication and selectable markers permitting transformation of the hostcell, e.g., the ampicillin resistance gene of E. coli and S. cerevisiaeTRP1 gene, and a promoter derived from a highly-expressed gene to directtranscription of a downstream structural sequence. Such promoters can bederived from operons encoding glycolytic enzymes such as3-phosphoglycerate kinase (PGK), a-factor, acid phosphatase, or heatshock proteins, among others. The heterologous structural sequence isassembled in appropriate phase with translation initiation andtermination sequences, and preferably, a leader sequence capable ofdirecting secretion of translated protein into the periplasmic space orextracellular medium. Optionally, the heterologous sequence can encode afusion protein including an N-terminal identification peptide impartingdesired characteristics, e.g., stabilization or simplified purificationof expressed recombinant product.

[0104] Useful expression vectors for bacterial use are constructed byinserting a structural DNA sequence encoding a desired protein togetherwith suitable translation initiation and termination signals in operablereading phase with a functional promoter. The vector will comprise oneor more phenotypic selectable markers and an origin of replication toensure maintenance of the vector and to, if desirable, provideamplification within the host. Suitable prokaryotic hosts fortransformation include E. coli, Bacillus subtilis, Salmonellatyphimurium and various species within the genera Pseudomonas,Streptomyces, and Staphylococcus, although others may, also be employedas a matter of choice.

[0105] As a representative but nonlimiting example, useful expressionvectors for bacterial use can comprise a selectable marker and bacterialorigin of replication derived from commercially available plasmidscomprising genetic elements of the well known cloning vector pBR322(ATCC 37017). Such commercial vectors include, for example, pKK223-3(Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (Promega Biotec,Madison, Wis., USA). These pBR322 “backbone” sections are combined withan appropriate promoter and the structural sequence to be expressed.

[0106] Following transformation of a suitable host strain and growth ofthe host strain to an appropriate cell density, the selected promoter isderepressed by appropriate means (e.g., temperature shift or chemicalinduction) and cells are cultured for an additional period. Cells aretypically harvested by centrifugation, disrupted by physical or chemicalmeans, and the resulting crude extract retained for furtherpurification.

[0107] Various mammalian cell culture systems can also be employed toexpress recombinant protein. Examples of mammalian expression systemsinclude the COS-7 lines of monkey kidney fibroblasts, described byGluzman, Cell 23:175 (1981), and other cell lines capable of expressinga compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK celllines. Mammalian expression vectors will comprise an origin ofreplication, a suitable promoter and enhancer, and also any necessaryribosome binding sites, polyadenylation site, splice donor and acceptorsites, transcriptional termination sequences, and 5′ flankingnontranscribed sequences. DNA sequences derived from the SV40 viralgenome, for example, SV40 origin, early promoter, enhancer, splice, andpolyadenylation sites may be used to provide the required nontranscribedgenetic elements.

[0108] Recombinant polypeptides and proteins produced in bacterialculture is usually isolated by initial extraction from cell pellets,followed by one or more salting-out, aqueous ion exchange or sizeexclusion chromatography steps. Protein refolding steps can be used, asnecessary, in completing configuration of the mature protein. Finally,high performance liquid chromatography (HPLC) can be employed for finalpurification steps. Microbial cells employed in expression of proteinscan be disrupted by any convenient method, including freeze-thawcycling, sonication, mechanical disruption, or use of cell lysingagents.

[0109] The present invention further includes isolated polypeptides,proteins and nucleic acid molecules which are substantially equivalentto those herein described. As used herein, substantially equivalent canrefer both to nucleic acid and amino acid sequences, for example amutant sequence, that varies from a reference sequence by one or moresubstitutions, deletions, or additions, the net effect of which does notresult in an adverse functional dissimilarity between reference andsubject sequences. For purposes of the present invention, sequenceshaving equivalent biological activity, and equivalent expressioncharacteristics are considered substantially equivalent. For purposes ofdetermining equivalence, truncation of the mature sequence should bedisregarded.

[0110] The invention further provides methods of obtaining homologs fromother strains of Haemophilus influenzae, of the fragments of theHaemophilus influenzae Rd genome of the present invention and homologsof the proteins encoded by the ORFs of the present invention. As usedherein, a sequence or protein of Haemophilus influenzae is defined as ahomolog of a fragment of the Haemophilus influenzae Rd genome or aprotein encoded by one of the ORFs of the present invention, if itshares significant homology to one of the fragments of the Haemophilusinfluenzae Rd genome of the present invention or a protein encoded byone of the ORFs of the present invention. Specifically, by using thesequence disclosed herein as a probe or as primers, and techniques suchas PCR cloning and colony/plaque hybridization, one skilled in the artcan obtain homologs.

[0111] As used herein, two nucleic acid molecules or proteins are saidto “share significant homology” if the two contain regions which processgreater than 85% sequence (amino acid or nucleic acid) homology.

[0112] Region specific primers or probes derived from the nucleotidesequence provided in SEQ ID NO: 1 or from a nucleotide sequence at least99.9% identical to SEQ ID NO: 1 can be used to prime DNA synthesis andPCR amplification, as well as to identify colonies containing cloned DNAencoding a homolog using known methods (Innis et al, PCR Protocols,Academic Press, San Diego, Calif. (1990)).

[0113] When using primers derived from SEQ ID NO: 1 or from a nucleotidesequence at least 99.9% identical to SEQ ID NO: 1, one skilled in theart will recognize that by employing high stringency conditions (e.g.,annealing at 50-60° C.) only sequences which are greater than 75%homologous to the primer will be amplified. By employing lowerstringency conditions (e.g., annealing at 35-37° C.), sequences whichare greater than 40-50% homologous to the primer will also be amplified.

[0114] When using DNA probes derived from SEQ ID NO: 1 or from anucleotide sequence at least 99.9% identical to SEQ ID NO: 1 forcolony/plaque hybridization, one skilled in the art will recognize thatby employing high stringency conditions (e.g., hybridizing at 50-65° C.in 5×SSC and 50% formamide, and washing at 50-65° C. in 0.5×SSC),sequences having regions which are greater than 90% homologous to theprobe can be obtained, and that by employing lower stringency conditions(e.g., hybridizing at 35-37° C. in 5×SSC and 40-45% formamide, andwashing at 42° C. in SSC), sequences having regions which are greaterthan 35-45% homologous to the probe will be obtained.

[0115] Any organism can be used as the source for homologs of thepresent invention so long as the organism naturally expresses such aprotein or contains genes encoding the same. The most preferred organismfor isolating homologs are bacterias which are closely related toHaemophilus influenzae Rd.

[0116] Uses for the Compositions of the Invention

[0117] Each ORF provided in Table 1(a) was assigned to one of 102biological role categories adapted from Riley, M., Microbiology Reviews57(4):862 (1993)). This allows the skilled artisan to determine a usefor each identified coding sequence. Tables 1(a) further provides anidentification of the type of polypeptide which is encoded for by eachORF. As a result, one skilled in the art can use the polypeptides of thepresent invention for commercial, therapeutic and industrial purposesconsistent with the type of putative identification of the polypeptide.

[0118] Such identifications permit one skilled in the art to use theHaemophilus influenzae OREs in a manner similar to the known type ofsequences for which the identification is made; for example, to fermenta particular sugar source or to produce a particular metabolite. (For areview of enzymes used within the commercial industry, see BiochemicalEngineering and Biotechnology Handbook 2nd, eds. Macmillan Publ. Ltd.,NY (1991) and Biocatalysts in Organic Syntheses, ed. J. Tramper et al.,Elsevier Science Publishers, Amsterdam, The Netherlands (1985)).

[0119] Biosynthetic Enzymes

[0120] Open reading frames encoding proteins involved in mediating thecatalytic reactions involved in intermediary and macromolecularmetabolism, the biosynthesis of small molecules, cellular processes andother functions includes enzymes involved in the degradation of theintermediary products of metabolism, enzymes involved in centralintermediary metabolism, enzymes involved in respiration, both aerobicand anaerobic, enzymes involved in fermentation, enzymes involved in ATPproton motor force conversion, enzymes involved in broad regulatoryfunction, enzymes involved in amino acid synthesis, enzymes involved innucleotide synthesis, enzymes involved in cofactor and vitaminsynthesis, can be used for industrial biosynthesis. The variousmetabolic pathways present in Haemophilus can be identified based onabsolute nutritional requirements as well as by examining the variousenzymes identified in Table 1(a).

[0121] Identified within the category of intermediary metabolism, anumber of the proteins encoded by the identified ORFs in Tables 1(a) areparticularly involved in the degradation of intermediary metabolites aswell as non-macromolecular metabolism. Some of the enzymes identifiedinclude amylases, glucose oxidases, and catalase.

[0122] Proteolytic enzymes are another class of commercially importantenzymes. Proteolytic enzymes find use in a number of industrialprocesses including the processing of flax and other vegetable fibers,in the extraction, clarification and depectinization of fruit juices, inthe extraction of vegetables' oil and in the maceration of fruits andvegetables to give unicellular fruits. A detailed review of theproteolytic enzymes used in the food industry is provided by Rombouts etal., Symbiosis 21:79 (1986) and Voragen et al. in Biocatalyst inAgricultural Biotechnology, edited J. R. whitaker et al., AmericanChemical Society Symposium Series 389:93 (1989)).

[0123] The metabolism of glucose, galactose, fructose and xylose areimportant parts of the primary metabolism of Haemophilus. Enzymesinvolved in the degradation of these sugars can be used in industrialfermentation. Some of the important sugar transforming enzymes, from acommercial viewpoint, include sugar isomerases such as glucoseisomerase. Other metabolic enzymes have found commercial use such asglucose oxidases which produces ketogulonic acid (KGA). KGA is anintermediate in the commercial production of ascorbic acid using theReichstein's procedure (see Krueger et al., Biotechnology 6(A), Rhine,H. J. et al., eds., Verlag Press, Weinheim, Germany (1984)).

[0124] Glucose oxidase (GOD) is commercially available and has been usedin purified form as well as in an immobilized form for the deoxygenationof beer. See Hartmeir et al., Biotechnology Letters 1:21 (1979). Themost important application of GOD is the industrial scale fermentationof gluconic acid. Market for gluconic acids which are used in thedetergent, textile, leather, photographic, pharmaceutical, food, feedand concrete industry (see Bigelis in Gene Manipulations and Fungi,Benett, J. W. et al., eds., Academic Press, New York (1985), p. 357). Inaddition to industrial applications, GOD has found applications inmedicine for quantitative determination of glucose in body fluidsrecently in biotechnology for analyzing syrups from starch and cellulosehydrosylates. See Owusu et al., Biochem. et Biophysica. Acta. 872:83(1986).

[0125] The main sweetener used in the world today is sugar which comesfrom sugar beets and sugar cane. In the field of industrial enzymes, theglucose isomerase process shows the largest expansion in the markettoday. Initially, soluble enzymes were used and later immobilizedenzymes were developed (Krueger et al., Biotechnology, The Textbook ofIndustrial Microbiology, Sinauer Associated Incorporated, Sunderland,Mass. (1990)). Today, the use of glucose-produced high fructose syrupsis by far the largest industrial business using immobilized enzymes. Areview of the industrial use of these enzymes is provided by Jorgensen,Starch 40:307 (1988).

[0126] Proteinases, such as alkaline serine proteinases, are used asdetergent additives and thus represent one of the largest volumes ofmicrobial enzymes used in the industrial sector. Because of theirindustrial importance, there is a large body of published andunpublished information regarding the use of these enzymes in industrialprocesses. (See Faultman et al., Acid Proteases Structure Function andBiology, Tang, J., ed., Plenum Press, New York (1977) and Godfrey etal., Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) andHepner et al., Report Industrial Enzymes by 1990, Hel Hepner &Associates, London (1986)).

[0127] Another class of commercially usable proteins of the presentinvention are the microbial lipases identified in Table 1 (see Macrae etal., Philosophical Transactions of the Chiral Society of London 310:227(1985) and Poserke, Journal of the American Oil Chemist Society 61:1758(1984). A major use of lipases is in the fat and oil industry for theproduction of neutral glycerides using lipase catalyzedinter-esterification of readily available triglycerides. Application oflipases include the use as a detergent additive to facilitate theremoval of fats from fabrics in the course of the washing procedures.

[0128] The use of enzymes, and in particular microbial enzymes, ascatalyst for key steps in the synthesis of complex organic molecules isgaining popularity at a great rate. One area of great interest is thepreparation of chiral intermediates. Preparation of chiral intermediatesis of interest to a wide range of synthetic chemists particularly thosescientists involved with the preparation of new pharmaceuticals,agrochemicals, fragrances and flavors. (See Davies et al., RecentAdvances in the Generation of Chiral Intermediates Using Enzymes, CRCPress, Boca Raton, Fla. (1990)). The following reactions catalyzed byenzymes are of interest to organic chemists: hydrolysis of carboxylicacid esters, phosphate esters, amides and nitrites, esterificationreactions, trans-esterification reactions, synthesis of amides,reduction of alkanones and oxoalkanates, oxidation of alcohols tocarbonyl compounds, oxidation of sulfides to sulfoxides, and carbon bondforming reactions such as the aldol reaction. When considering the useof an enzyme encoded by one of the ORFs of the present invention forbiotransformation and organic synthesis it is sometimes necessary toconsider the respective advantages and disadvantages of using amicroorganism as opposed to an isolated enzyme. Pros and cons of using awhole cell system on the one hand or an isolated partially purifiedenzyme on the other hand, has been described in detail by Bud et al.,Chemistry in Britain (1987), p. 127.

[0129] Amino transferases, enzymes involved in the biosynthesis andmetabolism of amino acids, are useful in the catalytic production ofamino acids. The advantages of using microbial based enzyme systems isthat the amino transferase enzymes catalyze the stereo-selectivesynthesis of only l-amino acids and generally possess uniformly highcatalytic rates. A description of the use of amino transferases foramino acid production is provided by Roselle-David, Methods ofEnzymology 136:479 (1987).

[0130] Another category of useful proteins encoded by the ORFs of thepresent invention include enzymes involved in nucleic acid synthesis,repair, and recombination. A variety of commercially important enzymeshave previously been isolated from members of Haemophilus sp. Theseinclude the Hinc II, Hind III, and Hinf I restriction endonucleases.Table 1(a) identifies a wide array of enzymes, such as restrictionenzymes, ligases, gyrases and methylases, which have immediate use inthe biotechnology industry.

[0131] Generation of Antibodies

[0132] As described here, the proteins of the present invention, as wellas homologs thereof, can be used in a variety procedures and methodsknown in the art which are currently applied to other proteins. Theproteins of the present invention can further be used to generate anantibody which selectively binds the protein. Such antibodies can beeither monoclonal or polyclonal antibodies, as well fragments of theseantibodies, and humanized forms.

[0133] The invention further provides antibodies which selectively bindto one of the proteins of the present invention and hybridomas whichproduce these antibodies. A hybridoma is an immortalized cell line whichis capable of secreting a specific monoclonal antibody.

[0134] In general, techniques for preparing polyclonal and monoclonalantibodies as well as hybridomas capable of producing the desiredantibody are well known in the art (Campbell, A. M., Monoclonal AntibodyTechnology: Laboratory Techniques in Biochemistry and Molecular Biology,Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St.Groth et al., J. Immunol. Methods 35:1-21 (1980); Kohler and Milstein,Nature 256:495-497 (1975)), the trioma technique, the human B-cellhybridoma technique (Kozbor et al., Immunology Today 4:72 (1983); Coleet al., in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc.(1985), pp. 77-96).

[0135] Any animal (mouse, rabbit, etc.) which is known to produceantibodies can be immunized with the pseudogene polypeptide. Methods forimmunization are well known in the art. Such methods includesubcutaneous or interperitoneal injection of the polypeptide. Oneskilled in the art will recognize that the amount of the protein encodedby the ORF of the present invention used for immunization will varybased on the animal which is immunized, the antigenicity of the peptideand the site of injection.

[0136] The protein which is used as an immunogen may be modified oradministered in an adjuvant in order to increase the protein'santigenicity. Methods of increasing the antigenicity of a protein arewell known in the art and include, but are not limited to coupling theantigen with a heterologous protein (such as globulin orβ-galactosidase) or through the inclusion of an adjuvant duringimmunization.

[0137] For monoclonal antibodies, spleen cells from the immunizedanimals are removed, fused with myeloma cells, such as SP2/0-Ag14myeloma cells, and allowed to become monoclonal antibody producinghybridoma cells.

[0138] Any one of a number of methods well known in the art can be usedto identify the hybridoma cell which produces an antibody with thedesired characteristics. These include screening the hybridomas with anELISA assay, western blot analysis, or radioimmunoassay (Lutz et al.,Exp. Cell Res. 175:109-124 (1988)).

[0139] Hybridomas secreting the desired antibodies are cloned and theclass and subclass is determined using procedures known in the art(Campbell, A. M., Monoclonal Antibody Technology: Laboratory Techniquesin Biochemistry and Molecular Biology, Elsevier Science Publishers,Amsterdam, The Netherlands (1984)).

[0140] Techniques described for the production of single chainantibodies (U.S. Pat. No. 4,946,778) can be adapted to produce singlechain antibodies to proteins of the present invention.

[0141] For polyclonal antibodies, antibody containing antisera isisolated from the immunized animal and is screened for the presence ofantibodies with the desired specificity using one of the above-describedprocedures.

[0142] The present invention further provides the above-describedantibodies in detectably labeled form. Antibodies can be detectablylabeled through the use of radioisotopes, affinity labels (such asbiotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase,alkaline phosphatase, etc.) fluorescent labels (such as FITC orrhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishingsuch labeling are well-known in the art, for example see (Sternberger,L. A. et al., J. Histochem. Cytochem. 18:315 (1970); Bayer, E. A. etal., Meth. Enzym. 62:308 (1979); Engval, E. et al., Immunol. 109:129(1972); Goding, J. W. J. Immunol. Meth. 13:215 (1976)).

[0143] The labeled antibodies of the present invention can be used forin vitro, in vivo, and in situ assays to identify cells or tissues inwhich a fragment of the Haemophilus influenzae Rd genome is expressed.

[0144] The present invention further provides the above-describedantibodies immobilized on a solid support. Examples of such solidsupports include plastics such as polycarbonate, complex carbohydratessuch as agarose and sepharose, acrylic resins and such as polyacrylamideand latex beads. Techniques for coupling antibodies to such solidsupports are well known in the art (Weir, D. M. et al., “Handbook ofExperimental Immunology” 4th Ed., Blackwell Scientific Publications,Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al., Meth. Enzym.34 Academic Press, N.Y. (1974)). The immobilized antibodies of thepresent invention can be used for in vitro, in vivo, and in situ assaysas well as for immunoaffinity purification of the proteins of thepresent invention.

[0145] Diagnostic Assays and Kits

[0146] The present invention further provides methods to identify theexpression of one of the ORFs of the present invention, or homologthereof, in a test sample, using one of the DFs or antibodies of thepresent invention.

[0147] In detail, such methods comprise incubating a test sample withone or more of the antibodies or one or more of the DFs of the presentinvention and assaying for binding of the DFs or antibodies tocomponents within the test sample.

[0148] Conditions for incubating a DF or antibody with a test samplevary. Incubation conditions depend on the format employed in the assay,the detection methods employed, and the type and nature of the DF orantibody used in the assay. One skilled in the art will recognize thatany one of the commonly available hybridization, amplification orimmunological assay formats can readily be adapted to employ the DFs orantibodies of the present invention. Examples of such assays can befound in Chard, T., An Introduction to Radioimmunoassay and RelatedTechniques, Elsevier Science Publishers, Amsterdam, The Netherlands(1986); Bullock, G. R. et al., Techniques in Immunocytochemistry,Academic Press, Orlando, Fla. Vol. 1 (1982), Vol. 2 (1983), Vol. 3(1985); Tijssen, P., Practice and Theory of Enzyme Immunoassays:Laboratory Techniques in Biochemistry and Molecular Biology, ElsevierScience Publishers, Amsterdam, The Netherlands (1985).

[0149] The test samples of the present invention include cells, proteinor membrane extracts of cells, or biological fluids such as sputum,blood, serum, plasma, or urine. The test sample used in theabove-described method will vary based on the assay format, nature ofthe detection method and the tissues, cells or extracts used as thesample to be assayed. Methods for preparing protein extracts or membraneextracts of cells are well known in the art and can be readily beadapted in order to obtain a sample which is compatible with the systemutilized.

[0150] In another embodiment of the present invention, kits are providedwhich contain the necessary reagents to carry out the assays of thepresent invention.

[0151] Secifically, the invention provides a compartmentalized kit toreceive, in close confinement, one or more containers which comprises:(a) a first container comprising one of the DFs or antibodies of thepresent invention; and (b) one or more other containers comprising oneor more of the following: wash reagents, reagents capable of detectingpresence of a bound DF or antibody.

[0152] In detail, a compartmentalized kit includes any kit in whichreagents are contained in separate containers. Such containers includesmall glass containers, plastic containers or strips of plastic orpaper. Such containers allows one to efficiently transfer reagents fromone compartment to another compartment such that the samples andreagents are not cross-contaminated, and the agents or solutions of eachcontainer can be added in a quantitative fashion from one compartment toanother. Such containers will include a container which will accept thetest sample, a container which contains the antibodies used in theassay, containers which contain wash reagents (such as phosphatebuffered saline, Tris-buffers, etc.), and containers which contain thereagents used to detect the bound antibody or DF.

[0153] Types of detection reagents include labeled nucleic acid probes,labeled secondary antibodies, or in the alternative, if the primaryantibody is labeled, the enzymatic, or antibody binding reagents whichare capable of reacting with the labeled antibody. One skilled in theart will readily recognize that the disclosed DFs and antibodies of thepresent invention can be readily incorporated into one of theestablished kit formats which are well known in the art.

[0154] Screening Assay for Binding Agents

[0155] Using the isolated proteins of the present invention, the presentinvention further provides methods of obtaining and identifying agentswhich bind to a protein encoded by one of the ORFs of the presentinvention or to one of the fragments and the Haemophilus genome hereindescribed.

[0156] In detail, said method comprises the steps of:

[0157] (a) contacting an agent with an isolated protein encoded by oneof the ORFs of the present invention, or an isolated fragment of theHaemophilus genome; and

[0158] (b) determining whether the agent binds to said protein or saidfragment.

[0159] The agents screened in the above assay can be, but are notlimited to, peptides, carbohydrates, vitamin derivatives, or otherpharmaceutical agents. The agents can be selected and screened at randomor rationally selected or designed using protein modeling techniques.

[0160] For random screening, agents such as peptides, carbohydrates,pharmaceutical agents and the like are selected at random and areassayed for their ability to bind to the protein encoded by the ORF ofthe present invention.

[0161] Alternatively, agents may be rationally selected or designed. Asused herein, an agent is said to be “rationally selected or designed”when the agent is chosen based on the configuration of the particularprotein. For example, one skilled in the art can readily adapt currentlyavailable procedures to generate peptides, pharmaceutical agents and thelike capable of binding to a specific peptide sequence in order togenerate rationally designed antipeptide peptides, for example see Hurbyet al., Application of Synthetic Peptides: Antisense Peptides,” InSynthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp.289-307, and Kaspczak et al., Biochemistry 28:9230-8 (1989), orpharmaceutical agents, or the like.

[0162] In addition to the foregoing, one class of agents of the presentinvention, as broadly described, can be used to control gene expressionthrough binding to one of the ORFs or EMFs of the present invention. Asdescribed above, such agents can be randomly screened or rationallydesigned/selected. Targeting the ORF or EMF allows a skilled artisan todesign sequence specific or element specific agents, modulating theexpression of either a single ORF or multiple ORFs which rely on thesame EMF for expression control.

[0163] One class of DNA binding agents are agents which contain baseresidues which hybridize or form a triple helix formation by binding toDNA or RNA. Such agents can be based on the classic phosphodiester,ribonucleic acid backbone, or can be a variety of sulfhydryl orpolymeric derivatives which have base attachment capacity.

[0164] Agents suitable for use in these methods usually contain 20 to 40bases and are designed to be complementary to a region of the geneinvolved in transcription (triple helix—see Lee et al., Nucl. Acids Res.6:3073 (1979); Cooney et al., Science 241:456 (1988); and Dervan et al.,Science 251: 1360 (1991)) or to the mRNA itself (antisense—Okano, J.Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitorsof Gene Expression, CRC Press, Boca Raton, Fla. (1988)). Triple helix-formation optimally results in a shut-off of RNA transcription from DNA,while antisense RNA hybridization blocks translation of an mRNA moleculeinto polypeptide. Both techniques have been demonstrated to be effectivein model systems. Information contained in the sequences of the presentinvention is necessary for the design of an antisense or triple helixoligonucleotide and other DNA binding agents.

[0165] Agents which bind to a protein encoded by one of the ORFs of thepresent invention can be used as a diagnostic agent, in the control ofbacterial infection by modulating the activity of the protein encoded bythe ORF. Agents which bind to a protein encoded by one of the ORFs ofthe present invention can be formulated using known techniques togenerate a pharmaceutical composition for use in controlling Haemophilusgrowth and infection.

[0166] Vaccine and Pharmaceutical Composition

[0167] The present invention further provides pharmaceutical agentswhich can be used to modulate the growth of Haemophilus influenzae, oranother related organism, in vivo or in vitro. As used herein, a“pharmaceutical agent” is defined as a composition of matter which canbe formulated using known techniques to provide a pharmaceuticalcompositions. As used herein, the “pharmaceutical agents of the presentinvention” refers the pharmaceutical agents which are derived from theproteins encoded by the ORFs of the present invention or are agentswhich are identified using the herein described assays.

[0168] As used herein, a pharmaceutical agent is said to “modulated thegrowth of Haemophilus sp., or a related organism, in vivo or in vitro,”when the agent reduces the rate of growth, rate of division, orviability of the organism in question. The pharmaceutical agents of thepresent invention can modulate the growth of an organism in manyfashions, although an understanding of the underlying mechanism ofaction is not needed to practice the use of the pharmaceutical agents ofthe present invention. Some agents will modulate the growth by bindingto an important protein thus blocking the biological activity of theprotein, while other agents may bind to a component of the outer surfaceof the organism blocking attachment or rendering the organism more proneto act the bodies nature immune system. Alternatively, the agent may becomprise a protein encoded by one of the ORFs of the present inventionand serve as a vaccine. The development and use of a vaccine based onouter membrane components, such as the LPS, are well known in the art.

[0169] As used herein, a “related organism” is a broad term which refersto any organism whose growth can be modulated by one of thepharmaceutical agents of the present invention. In general, such anorganism will contain a homolog of the protein which is the target ofthe pharmaceutical agent or the protein used as a vaccine. As such,related organism do not need to be bacterial but may be fungal or viralpathogens.

[0170] The pharmaceutical agents and compositions of the presentinvention may be administered in a convenient manner such as by theoral, topical, intravenous, intraperitoneal, intramuscular,subcutaneous, intranasal or intradermal routes. The pharmaceuticalcompositions are administered in an amount which is effective fortreating and/or prophylaxis of the specific indication. In general, theyare administered in an amount of at least about 10 μg/kg body weight andin most cases they will be administered in an amount not in excess ofabout 8 mg/Kg body weight per day. In most cases, the dosage is fromabout 10 μg/kg to about 1 mg/kg body weight daily, taking into accountthe routes of administration, symptoms, etc.

[0171] The agents of the present invention can be used in native form orcan be modified to form a chemical derivative. As used herein, amolecule is said to be a “chemical derivative” of another molecule whenit contains additional chemical moieties not normally a part of themolecule. Such moieties may improve the molecule's solubility,absorption, biological halflife, etc. The moieties may alternativelydecrease the toxicity of the molecule, eliminate or attenuate anyundesirable side effect of the molecule, etc. Moieties capable ofmediating such effects are disclosed in Remington's PharmaceuticalSciences (1980).

[0172] For example, a change in the immunological character of thefunctional, derivative, such as affinity for a given antibody, ismeasured by a competitive type immunoassay. Changes in immunomodulationactivity are measured by the appropriate assay. Modifications of suchprotein properties as redox or thermal stability, biological halflife,hydrophobicity, susceptibility to proteolytic degradation or thetendency to aggregate with carriers or into multimers are assayed bymethods well known to the ordinarily skilled artisan.

[0173] The therapeutic effects of the agents of the present inventionmay be obtained by providing the agent to a patient by any suitablemeans (i.e., inhalation, intravenously, intramuscularly, subcutaneously,enterally, or parenterally). It is preferred to administer the agent ofthe present invention so as to achieve an effective concentration withinthe blood or tissue in which the growth of the organism is to becontrolled.

[0174] To achieve an effective blood concentration, the preferred methodis to administer the agent by injection. The administration may be bycontinuous infusion, or by single or multiple injections.

[0175] In providing a patient with one of the agents of the presentinvention, the dosage of the administered agent will vary depending uponsuch factors as the patient's age, weight, height, sex, general medicalcondition, previous medical history, etc. In general, it is desirable toprovide the recipient with a dosage of agent which is in the range offrom about 1 pg/kg to 10 mg/kg (body weight of patient), although alower or higher dosage may be administered. The therapeuticallyeffective dose can be lowered by using combinations of the agents of thepresent invention or another agent.

[0176] As used herein, two or more compounds or agents are said to beadministered “in combination” with each other when either (1) thephysiological effects of each compound, or (2) the serum concentrationsof each compound can be measured at the same time. The composition ofthe present invention can be administered concurrently with, prior to,or following the administration of the other agent.

[0177] The agents of the present invention are intended to be providedto recipient subjects in an amount sufficient to decrease the rate ofgrowth (as defined above) of the target organism.

[0178] The administration of the agent(s) of the invention may be foreither a “prophylactic” or “therapeutic” purpose. When providedprophylactically, the agent(s) are provided in advance of any symptomsindicative of the organisms growth. The prophylactic administration ofthe agent(s) serves to prevent, attenuate, or decrease the rate of onsetof any subsequent infection. When provided therapeutically, the agent(s)are provided at (or shortly after) the onset of an indication ofinfection. The therapeutic administration of the compound(s) serves toattenuate the pathological symptoms of the infection and to increase therate of recovery.

[0179] The agents of the present invention are administered to themammal in a pharmaceutically acceptable form and in a therapeuticallyeffective concentration. A composition is said to be “pharmacologicallyacceptable” if its administration can be tolerated by a recipientpatient. Such an agent is said to be administered in a “therapeuticallyeffective amount” if the amount administered is physiologicallysignificant. An agent is physiologically significant if its presenceresults in a detectable change in the physiology of a recipient patient.

[0180] The agents of the present invention can be formulated accordingto known methods to prepare pharmaceutically useful compositions,whereby these materials, or their functional derivatives, are combinedin admixture with a pharmaceutically acceptable carrier vehicle.Suitable vehicles and their formulation, inclusive of other humanproteins, e.g., human serum albumin, are described, for example, inRemington's Pharmaceutical Sciences (16th ed., Osol, A., Ed., Mack,Easton Pa. (1980)). In order to form a pharmaceutically acceptablecomposition suitable for effective administration, such compositionswill contain an effective amount of one or more of the agents of thepresent invention, together with a suitable amount of carrier vehicle.

[0181] Additional pharmaceutical methods may be employed to control theduration of action. Control release preparations may be achieved throughthe use of polymers to complex or absorb one or more of the agents ofthe present invention. The controlled delivery may be exercised byselecting appropriate macromolecules (for example polyesters, polyaminoacids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose,carboxymethylcellulose, or protamine, sulfate) and the concentration ofmacromolecules as well as the methods of incorporation in order tocontrol release. Another possible method to control the duration ofaction by controlled release preparations is to incorporate agents ofthe present invention into particles of a polymeric material such aspolyesters, polyamino acids, hydrogels, poly(lactic acid) or ethylenevinylacetate copolymers. Alternatively, instead of incorporating theseagents into polymeric particles, it is possible to entrap thesematerials in microcapsules prepared, for example, by coacervationtechniques or by interfacial polymerization, for example,hydroxymethylcellulose or gelatin-microcapsules andpoly-(methylmethacylate) microcapsules, respectively, or in colloidaldrug delivery systems, for example, liposomes, albumin microspheres,microemulsions, nanoparticles, and nanocapsules or in macroemulsions.Such techniques are disclosed in Remington's Pharmaceutical Sciences(1980).

[0182] The invention further provides a pharmaceutical pack or kitcomprising one or more containers filled with one or more of theingredients of the pharmaceutical compositions of the invention.Associated with such container(s) can be a notice in the form prescribedby a governmental agency regulating the manufacture, use or sale ofpharmaceuticals or biological products, which notice reflects approvalby the agency of manufacture, use or sale for human administration. Inaddition, the agents of the present invention may be employed inconjunction with other therapeutic compounds.

[0183] Shot-Gun Approach to Megabase DNA Sequencing

[0184] The present invention further provides the first demonstrationthat a sequence of greater than one megabase can be sequenced using arandom shotgun approach. This procedure, described in detail in theexamples that follow, has eliminated the up front cost of isolating andordering overlapping or contiguous subclones prior to the start of thesequencing protocols.

[0185] Certain aspects of the present invention are described in greaterdetail in the non-limiting Examples that follow.

EXAMPLES

[0186] Experimental Design and Methods

[0187] 1. Shotgun Sequencing Strategy

[0188] The overall strategy for a shotgun approach to whole genomesequencing is outlined in Table 3. The theory of shotgun sequencingfollows from the Lander and Waterman (Landerman and Waterman, Genomics2: 231 (1988)) application of the equation for the Poisson distributionp_(x)=m^(x)e^(−m)/x!, where x is the number of occurrences of an event,m is the mean number of occurrences, and p_(x) is the probability thatany given base is not sequenced after a certain amount of randomsequence has been generated. If L is the genome length, n is the numberof clone insert ends sequenced, and w is the sequencing read length,then m=nw/L, and the probability that no clone originates at any of thew bases preceding a given base, i.e., the probability that the base isnot sequenced, is p₀=e^(−m). Using the fold coverage as the unit for m,one sees that after 1.8 Mb of sequence has been randomly generated, m=1,representing 1× coverage. In this case, p₀=e=0.37, thus approximately37% is unsequenced. For example, 5× coverage (approximately 9500 clonessequenced from both insert ends and an average sequence read length of460 bp) yields p₀=e⁻⁵=0.0067, or 0.67% unsequenced. The total gap lengthis Le^(−m), and the average gap size is L/n. 5× coverage would leaveabout 128 gaps averaging about 100 bp in size. The treatment isessentially that of Lander and Waterman, Genomics 2:231 (1988). Table 4illustrates the coverage for a 1.9 Mb genome with an average fragmentsize of 460 bp.

[0189] 2. Random Library Construction

[0190] In order to approximate the random model described above duringactual sequencing, a nearly ideal library of cloned genomic fragment isrequired. The following library construction procedure was developed toachieve this.

[0191]H. influenzae Rd KW20 DNA was prepared by phenol extraction. Amixture (3.3 ml) containing 600 μg DNA, 300 mM sodium acetate, 10 mMTris-HCl, 1 mM Na-EDTA, 30% glycerol was sonicated (Branson Model 450Sonicator) at the lowest energy setting for 1 min. at 0° using a 3 mmprobe. The DNA was ethanol precipitated and redissolved in 500 μl TEbuffer. To create blunt-ends, a 100 μl aliquot was digested for 10 minat 300 in 200 μl BAL31 buffer with 5 units BAL31 2nuclease (New EnglandBioLabs). The DNA was phenol-extracted, ethanol-precipitated,redissolved in 100 μl TE buffer, electrophoresed on a 1.0% low meltingagarose gel, and the 1.6-2.0 kb size fraction was excised,phenol-extracted, and redissolved in 20 [μl TE buffer. A two-stepligation procedure was used to produce a plasmid library with 97% insertof which >99% were single inserts. The first ligation mixture (50 μl)contained 2 μg of DNA fragments, 2 μg SmaaI/BAP pUC18 DNA (Pharmacia),and 10 units T4 ligase (GIBCO/BRL), and incubation was at 14° for 4 hr.After phenol extraction and ethanol precipitation, the DNA was dissolvedin 20 μl TE buffer and electrophoresed on a 1.0% low melting agarosegel. A ladder of ethidium bromide-stained linear bands, identified bysize as insert (i), vector (v), v+i, v+2i, v+3i, . . . was visualized by360 nm UV light, and the v+i DNA was excised and recovered in 20 μl TE.The v+i DNA was blunt-ended by T4 polymerase treatment for 5 min. at 370in a reaction mixture (50 μl) containing the v+i linears, 500 μM each ofthe 4 dNTP's, and 9 units of T4 polymerase (New England BioLabs) underrecommended buffer conditions. After phenol extraction and ethanolprecipitation the repaired v+i linears were dissolved in 20 μl TE. Thefinal ligation to produce circles was carried out in a 50 μl reactioncontaining 5 μl of v+i linears and 5 units of T4 ligase at 14°overnight. After 10 min. at 70° the reaction mixture was stored at −20°.

[0192] This two-stage procedure resulted in a molecularly randomcollection of single-insert plasmid recombinants with minimalcontamination from double-insert chimeras (<1%) or free vector (<3%).Since deviation from randomness is most likely to occur during cloning,E. coli host cells deficient in all recombination and restrictionfunctions (A. Greener, Strategies 3 (1):5 (1990)) were used to preventrearrangements, deletions, and loss of clones by restriction.Transformed cells were plated directly on antibiotic diffusion plates toavoid the usual broth recovery phase which allows multiplication andselection of the most rapidly growing cells. Plating occurred asfollows:

[0193] A 100 μl aliquot of Epicurian Coli SURE II Supercompetent Cells(Stratagene 200152) was thawed on ice and transferred to a chilledFalcon 2059 tube on ice. A 1.7 μl aliquot of 1.42 M β-mercaptoethanolwas added to the aliquot of cells to a final concentration of 25 mM.Cells were incubated on ice for 10 min. A 1 μl aliquot of the finalligation was added to the cells and incubated on ice for 30 min. Thecells were heat pulsed for 30 sec. at 42° and placed back on ice for 2min. The outgrowth period in liquid culture was eliminated from thisprotocol in order to minimize the preferential growth of any giventransformed cell. Instead the transformation were plated directly on anutrient rich SOB plate containing a 5 ml bottom layer of SOB agar (1.5%SOB agar: 20 g tryptone, 5 g yeast extract, 0.5 g NaCl, 1.5% DifcoAgar/L). The 5 ml bottom layer is supplemented with 0.4 ml ampicillin(50 mg/ml)/100 ml SOB agar. The 15 ml top layer of SOB agar issupplemented with 1 ml X-Gal (2%), 1 ml MgCl₂ (1 M), and 1 ml MgSO₄/100ml SOB agar. The 15 ml top layer was poured just prior to plating. Ourtiter was approximately 100 colonies/10 μl aliquot of transformation.

[0194] All colonies were picked for template preparation regardless ofsize. Only clones lost due to “poison” DNA or deleterious gene productswould be deleted from the library, resulting in a slight increase in gapnumber over that expected.

[0195] In order to evaluate the quality of the H. influenzae library,sequence data were obtained from approximately 4000 templates using theM13-21 primer. The random sequence fragments were assembled using theAutoAssembler™ software (Applied Biosystems division of Perkin-Elmer(AB)) after obtaining 1300, 1800, 2500, 3200, and 3800 sequencefragments, and the number of unique assembled base pairs was determined.Based on the equations described above, an ideal plot of the number ofbase pairs remaining to be sequenced as a function of the # of sequencedfragments obtained with an average read length of 460 bp for a 2.5×10⁶and a 1.9×10⁶ bp genome was determined (FIG. 3). The progression ofassembly was plotted using the actual data obtained from the assembly ofup to 3800 sequence fragments and compared the data that is provided inthe ideal plot (FIG. 3). FIG. 3 illustrates that there was essentiallyno deviation of the actual assembly data from the ideal plot, indicatingthat we had constructed close to an ideal random library with minimalcontamination from double insert chimeras and free of vector.

[0196] 3. Random DNA Sequencing

[0197] High quality double stranded DNA plasmid templates (19,687) wereprepared using a “boiling bead” method developed in collaboration withAdvanced Genetic Technology Corp. (Gaithersburg, Md.) (Adams et al.,Science 252:1651 (1991); Adams et al., Nature 355:632 (1992)). Plasmidpreparation was performed in a 96-well format for all stages of DNApreparation from bacterial growth through final DNA purification.Template concentration was determined using Hoechst Dye and a MilliporeCytofluor. DNA concentrations were not adjusted, but low-yieldingtemplates were identified where possible and not sequenced. Templateswere also prepared from two H. influenzae lambda genomic libraries. Anamplified library was constructed in vector Lambda GEM-12 (Promega) andan unamplified library was constructed in Lambda DASH II (Stratagene).In particular, for the unamplified lambda library, H. influenzae Rd KW20DNA (>100 kb) was partially digested in a reaction mixture (200 μl)containing 50 μg DNA, 1× Sau3AI buffer, 20 units Sau3AI for 6 min. at23°. The digested DNA was phenol-extracted and electrophoresed on a 0.5%low melting agarose gel at 2V/cm for 7 hours. Fragments from 15 to 25 kbwere excised and recovered in a final volume of 6 μl. One ill offragments was used with 1 μl of DASHII vector (Stratagene) in therecommended ligation reaction. One μl of the ligation mixture was usedper packaging reaction following the recommended protocol with theGigapack II XL Packaging Extract (Stratagene, #227711). Phage wereplated directly without amplification from the packaging mixture (afterdilution with 500 μl of recommended SM buffer and chloroform treatment).Yield was about 2.5×10³ pfu/μl. The amplified library was preparedessentially as above except the lambda GEM-12 vector was used. Afterpackaging, about 3.5×10⁴ pfu were plated on the restrictive NM539 host.The lysate was harvested in 2 ml of SM buffer and stored frozen in 7%dimethylsulfoxide. The phage titer was approximately 1×10⁹ pfu/ml.

[0198] Liquid lysates (10 ml) were prepared from randomly selectedplaques and template was prepared on an anion-exchange resin (Qiagen).Sequencing reactions were carried out on plasmid templates using the ABCatalyst LabStation with Applied Biosystems PRISM Ready Reaction DyePrimer Cycle Sequencing Kits for the M13 forward (M13-21) and the M13reverse (M13RP1) primers (Adams et al., Nature 368:474 (1994)). Dyeterminator sequencing reactions were carried out on the lambda templateson a Perkin-Elmer 9600 Thermocycler using the Applied Biosystems ReadyReaction Dye Terminator Cycle Sequencing kits. T7 and SP6 primers wereused to sequence the ends of the inserts from the Lambda GEM-12 libraryand T7 and T3 primers were used to sequence the ends of the inserts fromthe Lambda DASH II library. Sequencing reactions (28,643) were performedby eight individuals using an average of fourteen AB 373 DNA Sequencersper day over a 3 month period. All sequencing reactions were analyzedusing the Stretch modification of the AB 373, primarily using a 34 cmwell-to-read distance. The overall sequencing success rate was 84% forM13-21 sequences, 83% for M13RP1 sequences and 65% for dye-terminatorreactions. The average usable read length was 485 bp for M13-21sequences, 444 bp for M13RP1 sequences, and 375 bp for dye-terminatorreactions. Table 5 summarizes the high-throughput sequencing phase ofthe invention.

[0199] Richards et al. (Richards et al., Automated DNA sequencing andAnalysis, M. D. Adams, C. Fields, J. C. Venter, Eds. (Academic Press,London, 1994), Chap. 28.) described the value of using sequence fromboth ends of sequencing templates to facilitate ordering of contigs inshotgun assembly projects of lambda and cosmid clones. We balanced thedesirability of both-end sequencing (including the reduced cost of lowertotal number of templates) against shorter read-lengths for sequencingreactions performed with the M13RP1 (reverse) primer compared to theM13-21 (forward) primer. Approximately one-half of the templates weresequenced from both ends. In total, 9,297 M13RP1 sequencing reactionswere done. Random reverse sequencing reactions were done based onsuccessful forward sequencing reactions. Some M13RP1 sequences wereobtained in a semi-directed fashion: M13-21 sequences pointing outwardat the ends of contigs were chosen for M13RP1 sequencing in an effort tospecifically order contigs. The semi-directed strategy was effective,and clone-based ordering formed an integral part of assembly and gapclosure (see below).

[0200] 4. Protocol for Automated Cycle Sequencing

[0201] The sequencing consisted of using eight ABI Catalyst robots andfourteen AB 373 Automated DNA Sequencers. The Catalyst robot is apublicly available sophisticated pipetting and temperature control robotwhich has been developed specifically for DNA sequencing reactions. TheCatalyst combines pre-aliquoted templates and reaction mixes consistingof deoxy- and dideoxynucleotides, the Taq thermostable DNA polymerase,fluorescently-labelled sequencing primers, and reaction buffer. Reactionmixes and templates were combined in the wells of an aluminum 96-wellthermocycling plate. Thirty consecutive cycles of linear amplification(e.g., one primer synthesis) steps were performed includingdenaturation, annealing of primer and template, and extension of DNAsynthesis. A heated lid with rubber gaskets on the thermocycling plateprevented evaporation without the need for an oil overlay.

[0202] Two sequencing protocols were used: dye-labelled primers anddye-labelled dideoxy chain terminators. The shotgun sequencing involvesuse of four dye-labelled sequencing primers, one for each of the fourterminator nucleotide. Each dye-primer is labeled with a differentfluorescent dye, permitting the four individual reactions to be combinedinto one lane of the 373 DNA Sequencer for electrophoresis, detection,and base-calling. AB currently supplies pre-mixed reaction mixes in bulkpackages containing all the necessary non-template reagents forsequencing. Sequencing can be done with both plasmid and PCR-generatedtemplates with both dye-primers and dye-terminators with approximatelyequal fidelity, although plasmid templates generally give longer usablesequences.

[0203] Thirty-two reactions were loaded per 373 Sequencer each day, fora total of 960 samples. Electrophoresis was run overnight following themanufacture's protocols, and the data was collected for twelve hours.Following electrophoresis and fluorescence detection, the AB 373performs automatic lane tracking and base-calling. The lane-tracking wasconfirmed visually. Each sequence electropherogram (or fluorescence lanetrace) was inspected visually and assessed for quality. Trailingsequences of low quality were removed and the sequence itself was loadedvia software to a Sybase database (archived daily to a 8 mm tape).Leading vector polylinker sequence was removed automatically by softwareprogram. Average edited lengths of sequences from the standard ABI 373were around 400 bp and depended mostly on the quality of the templateused for the sequencing reaction. All of the ABI 373 Sequencers wereconverted to Stretch Liners, which provided a longer electrophoresispath prior to fluorescence detection, thus increasing the average numberof usable bases to 500-600 bp.

[0204] Informatics

[0205] 1. Data Management

[0206] A number of information management systems (LIMA) for alarge-scale sequencing lab have been developed (Kerlavage et al.,Proceedings of the Twenty-Sixth Annual Hawaii International Conferenceon System Sciences, IEEE Computer Society Press, Washington D.C., 585(1993)). The system used to collect and assemble the sequence data wasdeveloped using the Sybase relational data management system and wasdesigned to automate data flow wherever possible and to reduce usererror. The database stores and correlates all information collectedduring the entire operation from template preparation to final analysisof the genome. Because the raw output of the AB 373 Sequencers was basedon a Macintosh platform and the data management system chosen was basedon a Unix platform, it was necessary to design and implement a varietyof multi-user, client server applications which allow the raw data aswell as analysis results to flow seamlessly into the database with aminimum of user effort. A description of the software programs used forlarge sequence assembly and management is provided in FIG. 4.

[0207] 2. Assembly

[0208] In assembly engine (TIGR Assembler) was developed for the rapidand accurate assembly of thousands of sequence fragments. The ABAutoAssembler™ was modified (and named TIGR Editor) to provide agraphical interface to the electropherogram for the purpose of editingdata associated with the aligned sequence file output of TIGR Assembler.TIGR Editor maintains synchrony between the electropherogram files onthe Macintosh platform and the sequence data in the H. influenzaedatabase on the Unix platform.

[0209] The TIGR assembler simultaneously clusters and assemblesfragments of the genome. In order to obtain the speed necessary toassemble more than 10⁴ fragments, the algorithm builds a hash table of10 bp oligonucleotide subsequences to generate a list of potentialsequence fragment overlaps. The number of potential overlaps for eachfragment determines which fragments are likely to fall into repetitiveelements. Beginning with a single seed sequence fragment, TIGR Assemblerextends the current contig by attempting to add the best matchingfragment based on oligonucleotide content. The current contig andcandidate fragment are aligned using a modified version of theSmith-Waterman algorithm (Waterman, M. S., Methods in Enzymology 164:765(1988)) which provides for optimal gapped alignments. The current contigis extended by the fragment only if strict criteria for the quality ofthe match are met. The match criteria include the minimum length ofoverlap, the maximum length of an unmatched end, and the minimumpercentage match. These criteria are automatically lowered by thealgorithm in regions of minimal coverage and raised in regions with apossible repetitive element. The number of potential overlaps for eachfragment determines which fragments are likely to fall into repetitiveelements. Fragments representing the boundaries of repetitive elementsand potentially chimeric fragments are often rejected based on partialmismatches at the ends of alignments and excluded from the currentcontig. TIGR Assembler is designed to take advantage of clone sizeinformation coupled with sequencing from both ends of each template. Itenforces the constraint that sequence fragments from two ends of thesame template point toward one another in the contig and are locatedwithin a certain ranged of base pairs (definable for each clone based onthe known clone size range for a given library). Assembly of 24,304sequence fragments of H. influenzae required 30 hours of CPU time usingone processor on a SPARCenter 2000 with 512 Mb of RAM. This processresulted in approximately 210 contigs. Because of the high stringency ofthe TIGR Assembler, all contigs were searched against each other usinggrasta (a modified fasta (Person and Lipman, Proc. Natl. Acad. Sci. USA.85:2444 (1988)). In this way, additional overlaps were detected whichenabled compression of the data set into 140 contigs. The location ofeach fragment in the contigs and extensive information about theconsensus sequence itself were loaded into the H. influenzae relationaldatabase.

[0210] 3. Ordering Assembled Contigs

[0211] After assembly the relative positions of the 140 contigs wereunknown. The contigs were ordered by asm.align. Asm.align uses a numberof relationships to identify and align contigs that are adjacent to eachother. Using this algorithm, the 140 contigs were placed onto 42 groupstotaling 42 physical gaps (no template DNA for the region) and 98sequence gaps (template available for gap closure).

[0212] Ordering Contigs Separated by Physical Gaps and Achieving Closure

[0213] Four integrated strategies were developed to order contigsseparated by physical gaps. Oligonucleotide primers were designed andsynthesized from the end of each contig group. These primers were thenavailable for use in one or more of the strategies outlined below:

[0214] Southern analysis was done to develop a unique “fingerprint” fora subset of 72 of the above oligonucleotides. This procedure was basedupon the supposition that labeled oligonucleotides homologous to theends of adjacent contigs should hybridize to common DNA restrictionfragments, and thus share a similar or identical hybridization patternor “fingerprint”. Oligonucleotides were labeled using 50 pmoles of each20 mer and 250 mCi of [γ-³²P]ATP and T4 polynucleotide kinase. Thelabeled oligonucleotides were purified using Sephadex G-25 superfine(Pharmacia) and 107 cpm of each was used in a Southern hybridizationanalysis of H. influenzae Rd chromosomal DNA digested with one frequentcutters (AseI) and five less frequent cutters (BglII, EcoRI, PstI, XbaI,and PvuII). The DNA from each digest was fractionated on a 0.7% agarosegel and transferred to Nytran Plus nylon membranes (Schleicher &Schuell). Hybridization was carried out for 16 hours at 40°. To removenon-specific signals, each blot was sequentially washed at roomtemperature with increasingly stringent conditions up to 0.1×SSC+0.5%SDS. Blots were exposed to a PhosphorImager cassette (MolecularDynamics) for several hours and hybridization patterns were visuallycompared.

[0215] Adjacent contigs identified in this manner were targeted forspecific PCR reactions.

[0216] Peptide links were made by searching each contig end using blastx(Altschul et al., J. Mol. Biol. 215:403 (1990)) against a peptidedatabase. If the ends of two contigs matched the same database sequencein an appropriate manner, then the two contigs were tentativelyconsidered to be adjacent to each other.

[0217] The two lambda libraries constructed from H. influenaze genomicDNA were probed with oligonucleotides designed from the ends of contiggroups (Kirkness et al., Genomics 10:985 (1991)). The positive plaqueswere then used to prepare templates and the sequence was determined fromeach end of the lambda clone insert. These sequence fragments weresearched using grasta against a database of all contigs. Two contigsthat matched the sequence from the opposite ends of the same lambdaclone were ordered. The lambda clone then provided the template forclosure of the sequence gap between the adjacent contigs. The lambdaclones were especially valuable for solving repeat structures.

[0218] To confirm the order of contigs found by the other approaches andestablish the order of non-ordered contigs, standard and long range (XL)PCR reactions were performed as follows.

[0219] Standard PCR was performed in the following manner. Each reactioncontained a 37 μl cocktail; 16.5 μl H₂O, 3 μl 25 mM MgCl₂, 8 μl of adNTP mix (1.25 MM each dNTP), 4.5 μl 10× PCR core buffer II (PerkinElmer), 25 ng H. influenzae Rd KW20 genomic DNA. The appropriate twoprimers (4 μl, 3.2 pmole/μl) were added to each reaction. A hot startwas performed at 95° for 5 min followed by a 75° hold. During the holdAmplitaq DNA polymerase (Perkin Elmer) 0.3 μl in 4.3 μl H₂O, 0.5 μl 10×PCR core buffer II, was added to each reaction. The PCR profile was 25cycles of 940/45 sec., denature; 55°/1 min., anneal; 72°/3 min,extension. All reactions were performed in a 96 well format on a PerkinElmer GeneAmp PCR System 9600.

[0220] Long range PCR (XL PCR) was performed as follows: Each reactioncontained a 35.2 μl cocktail; 12.0 μl H₂O, 2.2 μl mM Mg(OAc)₂, 4 μl of adNTP mix (200 μM final concentration), 12.0 μl 3.3×PCR buffer, 25 ng H.influenzae Rd KW20 genomic DNA. The appropriate two primers (5 μl, 3.2pmoles/μl) was added to each reaction. A hot start was performed at 94°for 1 minute. rTth polymerase, 2.0 μl (4 U/reaction) in 2.8 μl 3.3×PCRbuffer II was added to each reaction. The PCR profile was 18 cycles of94°/15 sec., denature; 62°/8 min., anneal and extend followed by 12cycles 94°/15 sec., denature; 62°/8 min. (increase 15 sec./cycle),anneal and extend; 72°/10 min., final extension. All reactions wereperformed in a 96 well format on a Perkin Elmer GeneAmp PCR System 9600.

[0221] Although a PCR reaction was performed for essentially everycombination of physical gap ends, techniques such as Southernfingerprinting, database matching, and the probing of large insertclones were particularly valuable in ordering contigs adjacent to eachother and reducing the number of combinatorial PCR reactions necessaryto achieve complete gap closure. Employing these strategies to an evengreater extent in future genome projects will increase the overallefficiency of complete genome closure. The number of physical gapsordered and closed by each of these techniques is summarized in Table 5.

[0222] Sequence information from the ends of 15-20 kb clones isparticularly suitable for gap closure, solving repeat structures, andproviding general confirmation of the overall genome assembly. We werealso concerned that some fragments of the H. influenaze genome would benon-clonable in a high copy plasmid in E. coli. We reasoned that lyticlambda clones would provide the DNA for these segments. Approximately100 random plaques were picked from the amplified lambda library,templates prepared, and sequence information obtained from each end.These sequences were searched (grasta) against the contigs and linked inthe database to their appropriate contig, thus providing a scaffoldingof lambda clones contributing additional support to the accuracy of thegenome assembly (FIG. 5). In addition to confirmation of the contigstructure, the lambda clones provided closure for 23 physical gaps.Approximately 78% of the genome is covered by lambda clones.

[0223] Lambda clones were also useful for solving repeat structures.Repeat structures identified in the genome were small enough to bespanned by a single clone from the random insert library, except for thesix ribosomal RNA operons and one repeat (2 copies) which was 5,340 bpin length. Oligonucleotide probes were designed from the unique flanksat the beginning of each repeat and hybridized to the lambda libraries.Positive plaques were identified for each flank and the sequencefragments from the ends of each clone were used to correctly orient therepeats within the genome.

[0224] The ability to distinguish and assemble the six ribosomal RNA(rRNA) operons of H. influenaze (16S subunit-23S subunit-5S subunit) wasa test of our overall strategy to sequence and assemble a complex genomewhich might contain a significant number of repeat regions. The highdegree of sequence similarity and the length of the six operons causedthe assembly process to cluster all the underlying sequences into a fewindistinguishable contigs. To determine the correct placement of theoperons in the sequence, a pair of unique flanking sequences wasrequired for each. No unique flanking sequences could be found at theleft (16S rRNA) ends. This region contains the ribosomal promoter andappeared to be non-clonable in the high copy number pUC18 plasmid.However, unique sequences could be identified at the right (5S) ends.Oligonucleotide primers were designed from these six flanking regionsand used to probe the two lambda libraries. For each of the six rRNAoperons at least one positive plaque was identified which completelyspanned the rRNA operon and contained unique flanking sequence at the16S and 5S ends. These plaques provided the templates for obtaining theunique sequence for each of the six rRNA operons.

[0225] An additional confirmation of the global structure of theassembled circular genome was obtained by comparing a computer generatedrestriction map based on the assembled sequence for the enzymes Apal,SmaI, and RsrII with the predicted physical map of Redfield and Lee(Genetic Maps: locus maps of complex genomes, S. J. O'Brien, Ed. ColdSpring Harbor Laboratory Press, New York, N.Y., 1990, 2110.). Therestriction fragments from the sequence-derived map matched those fromthe physical map in size and relative order (FIG. 5).

[0226] Editing

[0227] Simultaneous with the final gap filling process, each contig wasedited visually by reassembling overlapping 10 kb sections of contigsusing the AB AutoAssembler™ and the Fast Data Finder™ hardware.AutoAssembler™ provides a graphical interface to electropherogram datafor editing. The electropherogram data was used to assign the mostlikely base at each position. Where a discrepancy could not be resolvedor a clear assignment made, the automatic base calls were leftunchanged. Individual sequence changes were written to theelectropherogram files and a replication protocol (crash) was used tomaintain the synchrony of sequence data between the H. influenzaedatabase and the electropherogram files. Following editing, contigs werereassembled with TIGR Assembler prior to annotation.

[0228] Potential frameshifts identified in the course of annotating thegenome were saved as reports in the database. These reports include thecoordinates in a contig which the alignment software (praze) predicts tobe the most likely location of a missing or inserted base and arepresentation of the sequence alignment containing the frameshift.Apparent frameshifts were used to indicate areas of the sequence whichmay require further editing. Frameshifts were not corrected in caseswhere clear electropherogram data disagreed with a frameshift.Frameshift editing was performed with TIGR Editor.

[0229] The rRNA and other repeat regions precluded complete assembly ofthe circular genome with TIGR Assembler. Final assembly of the genomewas accomplished using comb_asm which splices together contigs based onshort overlaps.

[0230] Accuracy of the Genome Sequence

[0231] The accuracy of the H. influenaze genome sequence is difficult toquantitate because there is very little previously determined H.influenaze sequence and most of these sequences are from other strains.There are, however, three parameters of accuracy that can be applied tothe data. First, the number of apparent frameshifts in predicted H.influenaze genes, based on database similarities, is 148. Some of theseapparent frameshifts may be in the database sequences rather than inours, particularly considering that 49 of the apparent frameshifts arebased on matches to hypothetical proteins from other organisms. Second,there are 188 bases in the genome that remain as N ambiguities (1/9,735bp). Combining these two types of “known” errors, we can calculate amaximum sequence accuracy of 99.98%. The average coverage is 6.5× andless than 1% of the genome is single-fold coverage.

[0232] Identifying Genes

[0233] An attempt was made to predict all of the coding regions of theH. influenzae Rd genome and identify genes, tRNAs and rRNAs, as well asother features of the DNA sequence (e.g., repeats, regulatory sites,replication origin sites, nucleotide composition). A description of someof the readily apparent sequence features is provided below.

[0234] The H. influenaze Rd genome is a circular chromosome of 1,830,121bp. The overall G/C nucleotide content is approximately 38% (A=31%,C=19%, G=1,9%, T=31%, IUB=0.035%). The G/C content of the genome wasexamined with several window lengths to look for global structuralfeatures. With a window of 5,000 bp, the G/C content is relatively evenexcept for 7 large G/C-rich regions and several A/T-rich regions (FIG.5). The G/C rich regions correspond to six rRNA operons and the locationof a cryptic mu-like prophage. Genes for several proteins withsimilarity to proteins encoded by bacteriophage mu are located atapproximately position 1.56-1.59 Mbp of the genome. This area of thegenome has a markedly higher G/C content than average for H. influenaze(˜50% G/C compared to ˜38% for the rest of the genome). No significancehas yet been ascertained for the source or importance of the A/T richregions.

[0235] The minimal origin of replication (oriC) in E. coli is a 245 bpregion defined by three copies of a thirteen base pair repeat containinga GATC core sequence at one end and four copies of a nine base pairrepeat containing a TTAT core sequence at the other end. The GATC sitesare methylation targets and control replication while the TTAT sitesprovide the binding sites for DnaA, the first step in the replicationprocess (Genes V, B. Lewin Ed. (Oxford University Press, New York,1994), chap. 18-19). An approximately 281 bp sequence (602,483-602,764)whose limits are defined by these same core sequences appears to definethe origin of replication in H. influenaze Rd. These coordinates liebetween sets of ribosomal operons rrnF, rrnE, rrnD and rrnA, rrnB, rrnC.These two groups of ribosomal operons are transcribed in oppositedirections and the placement of the origin is consistent with theirpolarity for transcription. Termination of E. coli replication is markedby two 23 bp termination sequences located 100 kb on either side of themidway point at which the two replication forks meet. Two potentialtermination sequences sharing a 10 bp core sequence with the E. colitermination sequence were identified in H. influenaze at coordinates1,375,949-1,375,958 and 1,558,759-1,558,768. These two sets ofcoordinates are offset approximately 100 kb from the point 180° oppositeof the proposed origin of H. influenaze replication.

[0236] Six rRNA operons were identified. Each rRNA operon contains threerRNA subunits and a variable spacer region in the order: 16Ssubunit—spacer region—23S subunit—5S subunit. The subunit lengths are1539 bp, 2653 bp, and 116 bp, respectively. The G/C content of the threeribosomal subunits (50%) is higher than the genome as a whole. The G/Ccontent of the spacer region (38%) is consistent with the remainder ofthe genome. The nucleotide sequence of the three rRNA subunits is 100%identical in all six ribosomal operons. The rRNA operons can be groupedinto two classes based on the spacer region between the 16S and 23Ssequences. The shorter of the two spacer regions is 478 bp in length(rrnB, rrnE, and rrnF) and contains the gene for tRNA Glu. The longerspacer is 723 bp in length (rrna, rrnC, and rrnD) and contains the genesfor tRNA lie and tRNA Ala. The two sets of spacer regions are also 100%identical across each group of three operons. tRNA genes are alsopresent at the 16S and 5S ends of two of the rRNA operons. The genes fortRNA Arg, tRNA His, and tRNA Pro are located at the 16S end of rrnEwhile the genes for tRNA Trp, and tRNA Asp are located at the 5S end ofrrnA.

[0237] The predicted coding regions of the H. influenaze genome wereinitially defined by evaluating their coding potential with the programGenemark (Borodovsky and McIninch, Computers Chem. 17(2):123 (1993))using codon frequency matrices derived from 122 H. influenaze codingsequences in GenBank. The predicted coding region sequences (plus 300 bpof flanking sequence) were used in searches against a database ofnon-redundant bacterial proteins (NRBP) created specifically for theannotation. Redundancy was removed from NRBP at two stages. All DNAcoding sequences were extracted from GenBank (release 85), and sequencesfrom the same species were searched against each other. Sequenceshaving >97% similarity over regions >100 nucleotides were combined. Inaddition, the sequences were translated and used in protein comparisonswith all sequences in Swiss-Prot (release 30). Sequences belonging tothe same species and having >98% similarity over 33 amino acids werecombined. NRI P is composed of 21,445 sequences extracted from 23,751GenBank sequences and 11,183 Swiss-Prot sequences from 1,099 differentspecies.

[0238] A total of 1,749 predicted coding regions were identified.Searches of the H. influenzae predicted coding regions were performedusing an algorithm that translates the query DNA sequence in the threeplus-strand reading frames for searching against NRBP, identifies theprotein sequences that match the query, and aligns the protein-proteinmatches using praze, a modified Smith-Waterman (Pearson and Lipman,Proc. Natl. Acad. Sci. U.S.A. 85:2444 (1988)) algorithm. In cases whereinsertion or deletions in the DNA sequence produced a frameshift error,the alignment algorithm started with protein regions of maximumsimilarity and extended the alignment to the same database match inalternative frames using the 300 bp flanking region. Regions known tocontain frameshift errors were saved in the database and evaluated forpossible correction. Unidentified predicted coding regions and theremaining intergenic sequences were searched against a dataset of allavailable peptide sequences from Swiss-Prot, PIR, and GenBank.Identification of operon structures will be facilitated by experimentaldetermination of transcription promoter and termination sites.

[0239] Each putatively identified H. influenaze gene was assigned to oneof 102 biological role categories adapted from Riley (Riley, M.,Microbiology Reviews 57(4):862 (1993)). Assignments were made by linkingthe protein sequence of the predicted coding regions with the Swiss-Protsequences in the Riley database. Of the 1,749 predicted coding regions,724 have no role assignment. Of these, no database match was found for384, while 340 matched “hypothetical proteins” in the database. Roleassignments were made for 1,025 of the predicted coding regions. Acompilation of all the predicted coding regions, their uniqueidentifiers, a three letter gene identifier, percent identity, percentsimilarity, and amino acid match length are presented in Table 1(a).

[0240] An annotated complete genome map of H. influenaze Rd is presentedin FIGS. 6(A)-(AN). The map places each predicted coding region on theH. influenaze chromosome and indicates its direction of transcription.

[0241] A survey of the genes and their chromosomal organization in H.influenaze Rd make possible a description of the metabolic processes H.influenaze requires for survival as a free living organism, thenutritional requirements for its growth in the laboratory, and thecharacteristics which make it unique from other organisms specificallyas it relates to its pathogenicity and virulence. The genome would beexpected to have complete complements of certain classes of genes knownto be essential for life. For example, there is a one-to-onecorrespondence of published E. coli ribosomal protein sequences topotential homologs in the H. influenaze database. Likewise, as shown inTable 1(a), an aminoacyl tRNA-synthetase is present in the genome foreach amino acid. Finally, the location of tRNA genes was mapped onto thegenome. There are 54 identified tRNA genes, including representatives ofall 20 amino acids.

[0242] In order to survive as a free living organism, H. influenaze mustproduce energy in the form of ATP via fermentation and/or electrontransport. As a facultative anaerobe, H. influenaze Rd is known toferment glucose, fructose, galactose, ribose, xylose and fucose(Dorocicz et al., J. Bacteriol. 175:7142 (1993)). The genes identifiedin Table 1(a) indicate that transport systems are available for theuptake of these sugars via the phosphoenolpyruvate-phosphotransferasesystem (PTS), and via non-PTS mechanisms. Genes that specify the commonphosphate-carriers Enzyme I and Hpr (ptsI and ptsH) of the PTS systemwere identified as well as the glucose specific crr gene. The ptsH,ptsI, and crr genes constitute the pts operon. We have not howeveridentified the gene encoding membrane-bound glucose specific Enzyme II.The latter enzyme is required for transport of glucose by the PTSsystem. A complete PTS system for fructose was identified.

[0243] Genes encoding the complete glycolytic pathway and for theproduction of fermentative end products were identified. Growthutilizing anaerobic respiratory mechanisms were found by identifyinggenes encoding functional electron transport systems using inorganicelectron acceptors such as nitrates, nitrites, and dimethylsulfoxide.Genes encoding three enzymes of the tricarboxylic acid (TCA) cycleappear to be absent from the genome. Citrate synthase, isocitratedehydrogenase, and acordtase were not found by searching the predictedcoding regions or by using the E. coli enzymes as peptide queriesagainst the entire genome in translation. This provides an explanationfor the very high level of glutamate (lg/L) which is required in definedculture media (Klein and Luginbuhl, J. Gen. Microbiol. 113:409 (1979)).Glutamate can be directed into the TCA cycle via conversion toalpha-ketoglutarate by glutamate dehydrogenase. In the absence of acomplete TCA cycle, glutamate presumably serves as the source of carbonfor biosynthesis of amino acids using precursors which branch from theTCA cycle. Functional electron transport systems are available for theproduction of ATP using oxygen as a terminal electron acceptor.

[0244] Previously unanswered questions regarding pathogenicity andvirulence can be addressed by examining certain classes of genes such asadhesions and the lipooligosaccharide biogenesis genes. Moxon andco-workers (Weiser et al., Cell 59:657 (1989)) have obtained evidencethat a number of these virulence-related genes contain tandem tetramerrepeats which undergo frequent addition and deletion of one or morerepeat units during replication such that the reading frame of the geneis changed and its expression thereby altered. It is now possible, usingthe complete genome sequence, to locate all such tandem repeat tracts(FIG. 5) and to begin to determine their roles in phase variation ofsuch potential virulence genes.

[0245]H. influenzae Rd possesses a highly efficient natural DNAtransformation system (Kahn and Smith, J. Membrane Biol. 138:155 (1984).A unique DNA uptake sequence site, 5′ AAGTGCGGT, present in multiplecopies in the genome, has been shown to be necessary for efficient DNAuptake. It is now possible to locate all of these sites and completelydescribe their distribution with respect to genic and intergenicregions. Fifteen genes involved in transformation have already beendescribed and sequenced (Redfield, R., J. Bacteriol. 173:5612 (1991);Chandler, M., Proc. Natl. Acad. Sci. U.S.A 89:1616 (1992); Barouki andSmith, J. Bacteriol. 163(2):629 (1985); Tomb et al., Gene 104:1 (1991);Tomb, J, Proc. Natl. Acad. Sci. U.S.A 89:10252 (1992)). Six of thegenes, comA to comF, comprise an operon which is under positive controlby a 22-bp palindromic competence regulatory element (CRE) about onehelix turn upstream of the promoter. The rec-2 transformation gene isalso controlled by this element. It is now possible to locate additionalcopies of CRE in the genome and discover potential transformation genesunder CRE control. In addition, it may now be possible to discover otherglobal regulatory elements with an ease not previously possible.

[0246] One well-described gene regulatory system in bacteria is the“two-component” system composed of a sensor molecule that detects somesort of environmental signal and a regulator molecule that isphosphorylated by the activated form of the sensor. The regulatorprotein is generally a transcription factor which, when activated by thesensor, turns on or off expression of a specific set of genes (forreview, see Albright et al., Ann. Rev. Genet. 23:311 (1989); Parkinsonand Kofoid, Ann. Rev. Genet. 26:71 (1992)). It has been estimated thatE. coli harbors 40 sensor-regulator pairs (Albright et al., Ann. Rev.Genet. 23:311 (1989); Parkinson and Kofoid, Ann. Rev. Genet. 26:71(1992)). The H. influenaze genome was searched with representativeproteins from each family of sensor and regulator proteins using tblastnand tfasta. Four sensor and five regulator proteins were identified withsimilarity to proteins from other species (Table 6). There appears to bea corresponding sensor for each regulator protein except CpxR. Searcheswith the CpxA protein from E. coli identified three of the four sensorslisted in Table 6, but no additional significant matches were found. Itis possible that the level of sequence similarity is low enough to beundetectable with tfasta. No representatives of the NtrC-class ofregulators were found. This class of proteins interacts directly withthe sigma-54 subunit of RNA polymerase, which is not present in H.influenaze. All of the regulator proteins fall into the OmpR subclass(Albright et al, Ann. Rev. Genet. 23:311 (1989); Parkinson and Kofoid,Ann. Rev. Genet. 26:71 (1992)). The phoBR and basRS genes of H.influenaze are adjacent to one another and presumably form an operon.The nar and arc genes are not located adjacent to one another.

[0247] Some of the most interesting questions that can be answered by acomplete genome sequence relate to what genes or pathways are absent.The non-pathogenic H. influenaze Rd strain varies significantly from thepathogenic serotype b strains. Many of the differences between these twostrains appear in factors affecting infectivity. For example, the eightgenes which make up the fimbrial gene cluster (vanHam et al., Mol.Microbiol. 13:673 (1994)) involved in adhesion of bacteria to host cellsare now shown to be absent in the Rd strain. The pepN and purE geneswhich flank the fimbrial cluster in H. influenaze type b strains areadjacent to one another in the Rd strain (FIG. 7), suggesting that theentire fimbrial duster was excised. On a broader level, we determinedwhich E. coli proteins are not in H. influenzae by taking advantage of anon-redundant set of protein coding genes from E. coli, namely theUniversity of Wisconsin Genome Project contigs in GenBank: 1,216predicted protein sequences from GenBank accessions D10483, L10328,U00006, U00039, U14003, and U18997 (Yura et al., Nucleic Acids Research20:3305 (1992); Burland et al., Genomics 16:551 (1993)). The minimumthreshold for matches was set so that even weak matches would be scoredas positive, thereby giving a minimal estimate of the E. coli genes notpresent in H. influenaze. tblastn was used to search each of the E. coliproteins against the complete genome. All blast scores >100 wereconsidered matches. Altogether 627 E. coli proteins matched at least oneregion of the H. influenaze genome and 589 proteins did not. The 589non-matching proteins were examined and found to contain adisproportionate number of hypothetical proteins from E. coli.Sixty-eight percent of the identified E. coli proteins were matched byan H. influenaze sequence whereas only 38% of the hypothetical proteinswere matched. Proteins are annotated as hypothetical based on a lack ofmatches with any other known protein (Yura et al., Nucleic AcidsResearch 20:3305 (1992); Burland et al., Genomics 16:551 (1993)). Atleast two potential explanations can be offered for the overrepresentation of hypothetical proteins among those without matches:some of the hypothetical proteins are not, in fact, translated (at leastin the annotated frame), or these are E. coli-specific proteins that areunlikely to be found in any species except those most closely related toE. coli, for example Salmonella typhimurium.

[0248] A total of 384 predicted coding regions did not displaysignificant similarity with a six-frame translation of GenBank release87. These unidentified coding regions were compared to one another withfasta. Several novel gene families were identified. For example, twopredicted coding regions without database matches (HI0591, H10852) share75% identity over almost their entire lengths (139 and 143 amino acidresidues respectively). Their similarity to each other but failure tomatch any protein available in the current databases suggest that theycould represent a novel cellular function.

[0249] Other types of analyses can be applied to the unidentified codingregions, including hydropathy analysis, which indicates the patterns ofpotential membrane-spanning domains that are often conserved betweenmembers of receptor and transporter gene families, even in the absenceof significant amino acid identity. Five examples of unidentifiedpredicted coding regions that display potential transmembrane domainswith a periodic pattern that is characteristic of membrane-bound channelproteins are shown in FIG. 8. Such information can be used to focus onspecific aspects of cellular function that are affected by targeteddeletion or mutation of these genes.

[0250] Interest in the medically important aspects of H. influenazebiology has focused particularly on those genes which determinevirulence characteristics of the organism. Recently, the catalase genewas characterized and sequenced as a possible virulence-related gene(Bishai et al., J Bacteriol. 176:2914 (1994)). A number of the genesresponsible for the capsular polysaccharide have been mapped andsequenced (Kroll et al., Mol. Microbiol. 5(6):1549 (1991)). Severalouter membrane protein genes have been identified and sequenced(Langford et al., J. Gen. Microbiol. 138:155 (1992)). Thelipooligosaccharide component of the outer membrane and the genes of itssynthetic pathway are under intensive study (Weiser et al., J.Bacteriol. 173:3304 (1990)). While a vaccine is available, the study ofouter membrane components is motivated to some extent by the need forimproved vaccines.

[0251] Data Availability

[0252] The H. influenaze genome sequence has been deposited in theGenome Sequence DataBase (GSDB) with the accession number L42023. Thenucleotide sequence and peptide translation of each predicted codingregion with identified start and stop codons have also been accessionedby GSDB.

[0253] Production of an Antibody to a Haemophilus influenzae Protein

[0254] Substantially pure protein or polypeptide is isolated from thetransfected or transformed cells using any one of the methods known inthe art. The protein can also be produced in a recombinant prokaryoticexpression system, such as E. coli, or can by chemically synthesized.Concentration of protein in the final preparation is adjusted, forexample, by concentration on an Amicon filter device, to the level of afew micrograms/ml. Monoclonal or polyclonal antibody to the protein canthen be prepared as follows:

[0255] Monoclonal Antibody Production by Hybridoma Fusion

[0256] Monoclonal antibody to epitopes of any of the peptides identifiedand isolated as described can be prepared from murine hybridomasaccording to the classical method of Kohler, G. and Milstein, C., Nature256:495 (1975) or modifications of the methods thereof. Briefly, a mouseis repetitively inoculated with a few micrograms of the selected proteinover a period of a few weeks. The mouse is then sacrificed, and theantibody producing cells of the spleen isolated. The spleen cells arefused by means of polyethylene glycol with mouse myeloma cells, and theexcess unfused cells destroyed by growth of the system on selectivemedia comprising aminopterin (HAT media). The successfully fused cellsare diluted and aliquots of the dilution placed in wells of a microtiterplate where growth of the culture is continued. Antibody-producingclones are identified by detection of antibody in the supernatant fluidof the wells by immunoassay procedures, such as ELISA, as originallydescribed by Engvall, E., Meth. Enzymol. 70:419 (1980), and modifiedmethods thereof. Selected positive clones can be expanded and theirmonoclonal antibody product harvested for use. Detailed procedures formonoclonal antibody production are described in Davis, L. et al. BasicMethods in Molecular Biology Elsevier, New York. Section 21-2 (1989).

[0257] Polyclonal Antibody Production by Immunization

[0258] Polyclonal antiserum containing antibodies to heterogenousepitopes of a single protein can be prepared by immunizing suitableanimals with the expressed protein described above, which can beunmodified or modified to enhance immunogenicity. Effective polyclonalantibody production is affected by many factors related both to theantigen and the host species. For example, small molecules tend to beless immunogenic than other and may require the use of carriers andadjuvant. Also, host animals vary in response to site of inoculationsand dose, with both inadequate or excessive doses of antigen resultingin low titer antisera. Small doses (ng level) of antigen administered atmultiple intradermal sites appears to be most reliable. An effectiveimmunization protocol for rabbits can be found in Vaitukaitis, J. etal., J. Clin. Endocrinol. Metab. 33:988-991 (1971).

[0259] Booster injections can be given at regular intervals, andantiserum harvested when antibody titer thereof, as determinedsemi-quantitatively, for example, by double immunodiffusion in agaragainst known concentrations of the antigen, begins to fall. See, forexample, Ouchterlony, O. et al., Chap. 19 in: Handbook of ExperimentalImmunology, Wier, D., ed, Blackwell (1973). Plateau concentration ofantibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12μM). Affinity of the antisera for the antigen is determined by preparingcompetitive binding curves, as described, for example, by Fisher, D.,Chap. 42 in: Manual of Clinical Immunology, second edition, Rose andFriedman, eds., Amer. Soc. For Microbiology, Washington, D.C. (1980).

[0260] Antibody preparations prepared according to either protocol areuseful in quantitative immunoassays which determine concentrations ofantigen-bearing substances in biological samples; they are also usedsemi-quantitatively or qualitatively to identify the presence of antigenin a biological sample.

[0261] Preparation of PCR Primers and Amplifcation of DNA

[0262] Various fragments of the Haemophilus influenzae Rd genome, suchas those disclosed in Tables 1(a) and 2 can be used, in accordance withthe present invention, to prepare PCR primers for a variety of uses. ThePCR primers are preferably at least 15 bases, and more preferably atleast 18 bases in length. When selecting a primer sequence, it ispreferred that the primer pairs have approximately the same G/C ratio,so that melting temperatures are approximately the same. The PCR primersand amplified DNA of this Example find use in the Examples that follow.

[0263] Gene Expression from DNA Sequences Corresponding to ORFs

[0264] A fragment of the Haemophilus influenzae Rd genome provided inTables 1(a) or 2 is introduced into an expression vector usingconventional technology. (Techniques to transfer cloned sequences intoexpression vectors that direct protein translation in mammalian, yeast,insect or bacterial expression systems are well known in the art.)Commercially available vectors and expression systems are available froma variety of suppliers including Stratagene (La Jolla, Calif.), Promega(Madison, Wis.), and Invitrogen (San Diego, Calif.). If desired, toenhance expression and facilitate proper protein folding, the codoncontext and codon pairing of the sequence may be optimized for theparticular expression organism, as explained by Hatfield et al., U.S.Pat. No. 5,082,767, incorporated herein by this reference.

[0265] The following is provided as one exemplary method to generatepolypeptide(s) from cloned ORFs of the Haemophilus genome fragment.Since the ORF lacks a poly A sequence because of the bacterial origin ofthe ORF, this sequence can be added to the construct by, for example,splicing out the poly A sequence from pSG5 (Stratagene) using BglI andSalI restriction endonuclease enzymes and incorporating it into themammalian expression vector pXT1 (Stratagene) for use in eukaryoticexpression systems. pXT1 contains the LTRs and a portion of the gag genefrom Moloney Murine Leukemia Virus. The position of the LTRs in theconstruct allow efficient stable transfection. The vector includes theHerpes Simplex thymidine kinase promoter and the selectable neomycingene. The Haemophilus DNA is obtained by PCR from the bacterial vectorusing oligonucleotide primers complementary to the Haemophilus DNA andcontaining restriction endonuclease sequences for PstI incorporated intothe 5′ primer and BglII at the 5′ end of the corresponding HaemophilusDNA 3′ primer, taking care to ensure that the Haemophilus DNA ispositioned such that its followed with the poly A sequence. The purifiedfragment obtained from the resulting PCR reaction is digested with PstI,blunt ended with an exonuclease, digested with BglII, purified andligated to pXT1, now containing a poly A sequence and digested BglII.

[0266] The ligated product is transfected into mouse NIH 3T3 cells usingLipofectin (Life Technologies, Inc., Grand Island, N.Y.) underconditions outlined in the product specification. Positive transfectantsare selected after growing the transfected cells in 600 ug/ml G418(Sigma, St. Louis, Mo.). The protein is preferably released into thesupernatant. However if the protein has membrane binding domains, theprotein may additionally be retained within the cell or expression maybe restricted to the cell surface.

[0267] Since it may be necessary to purify and locate the transfectedproduct, synthetic 15-mer peptides synthesized from the predictedHaemophilus DNA sequence are injected into mice to generate antibody tothe polypeptide encoded by the Haemophilus DNA.

[0268] If antibody production is not possible, the Haemophilus DNAsequence is additionally incorporated into eukaryotic expression vectorsand expressed as a chimeric with, for example, 13-globin. Antibody toβ-globin is used to purify the chimeric. Corresponding protease cleavagesites engineered between the β-globin gene and the Haemophilus DNA arethen used to separate the two polypeptide fragments from one anotherafter translation. One useful expression vector for generating β-globinchimerics is pSG5 (Stratagene). This vector encodes rabbit β-globin.Intron II of the rabbit β-globin gene facilitates splicing of theexpressed transcript, and the polyadenylation signal incorporated intothe construct increases the level of expression. These techniques asdescribed are well known to those skilled in the art of molecularbiology. Standard methods are published in methods texts such as Daviset al. and many of the methods are available from the technicalassistance representatives from Stratagene, Life Technologies, Inc., orPromega. Polypeptide may additionally be produced from either constructusing in vitro translation systems such as In vitro Express™ TranslationKit (Stratagene).

[0269] While the present invention has been described in some detail forpurposes of clarity and understanding, one skilled in the art willappreciate that various changes in form and detail can be made withoutdeparting from the true scope of the invention.

[0270] All patents, patent applications and publications referred toabove are hereby incorporated by reference. TABLE 1(A) ORFs within theH. influenzae Rd genome FROM TO % Iden- % Simi- AA Gene ID NucleotideNucleotide Homology Match tity larity Length Amine acid biosynthesisGlutamate family HI0199 2O2598 204044 glulamate dehydrogenase (gdhA){Escherichia coli} 74.1 84.4 445 HI0857 915793 917833 glutaminesynthatase (glnA) {Proteus vulgaris} 70.7 85.9 467 HI1725 17924091759521 uridylyl transferase (glnD) {Escherichia coli} 46.6 67.8 854HI0813 861610 860240 argininosuccinate lyase (arginosuccinase) (asal)(argH) {Escherichia coli} 73.5 84.5 457 HI1733 1799112 1800443arginincsuccinate synthatase (argG) {Escherichia coli} 78.6 87.5 438HI0598 618753 617752 ornithine carbomoyltransferase (arcB) {Pseudomonasaeruginosa} 82.3 90.7 334 HI1242 1313013 1311763 gamma-glutamylphosphate reductase (proA) {Escherichia coli} 61.7 79.4 406 HI0902955518 956621 glutamate 5-kinase (gamma-glutamyl kinase) (proB){Escherichia coli} 65.7 80.2 363 Aspartate family HI0288 319209 320419aspartate aminotransferase (aspC) {Bacillus sp.} 31.1 53.8 349 HI16231684147 1685334 aspartate aminotransferase (aspC) {Escherichia coli}62.6 79.0 396 HI0566 582379 583368 asparagine synthetase A (asnA){Escherichia coli} 53.3 77.0 330 HI0648 690744 689522aspartate-semialdehyde dehydrogenase (asd) {Escherichia coli} 71.9 84.9367 HI1311 1385700 1386509 dehydredipicolinate reductase (dapB){Escherichia coli} 70.3 82.5 269 HI0729 779456 778212 diaminopimelatedecarboxylase (dap decarboxylase) (lysA) {Pseudomonas 57.6 78.8 413aeruginosa} HI0752 810250 811071 diaminopimelate epimerase (dapF){Escherichia coli} 77.0 85.8 274 HI0256 284972 285855dihydrodipicolinate synthetase (dapA) {Escherichia coli} 58.2 79.8 292HI1638 1692968 1694330 lysine-sensitive aspartokinase III (lysC){Escherichia coli} 55.3 73.2 449 HI0102 109226 108096succinyl-diaminopimelate desuccinylase (dapE) {Escherichia coli} 61.679.7 374 HI1640 1696728 1695820 tetrahydrodipicolinateN-succinyltransferase (dapD) {Actinobacillus 96.7 98.5 273pleuropneumoniae} HI0089 96280 93826 aspartokinase-homoserinedehydregenase (thrA) {Serratia marcescens} 62.2 77.4 814 HI0088 9282092879 homeserine kinase (thrB) {Serratia marcescens} 61.8 80.6 306HI0087 92822 91559 threonine synthase (thrC) {Serratia marcescens} 67.080.9 425 HI1044 1107725 1105876 B12-dependenthomocysteine-NS-methyltetrahydrofolate transmethylase 54.2 70.4 1217(metH) {Escherichia coli} HI0122 137932 136745 beta-cystathionase (melC){Escherichia coli} 65.4 84.1 390 HI0086 90743 89601 cystathioninegamma-synthase (mrtB) (Escherichia coli} 41.9 62.2 374 HI1266 13399831341056 homoserine acetyltransferase (met2) {Saccharomyces cerevisiae}38.1 57.1 387 HI1708 1772488 1771221 tetrahydropteroyltriglutamate metE){Escherichia coli} 52.4 68.0 747 Serine family HI0891 942266 943628serine hydroxymethyltransferase (serine methylase) (glyA){Actinobacillus 85.7 93.6 419 actinomycetemcomitans} HI0467 486594487523 phosphoglycerate dehydrogenase (serA) {Escherichia coli} 71.183.9 408 HI1170 1238587 1237502 phosphoserine aminotransferase (serC){Escherichia coli} 53.4 72.3 358 HI1035 1097572 1098514 phosphoserinephosphatase (o-phosphoserine phosphohydrolase) (serB) 52.3 69.5 303{Escherichia coli} HI1105 1165130 1166077 cysteine synthetase (cysK){Escherichia coli} 70.0 83.9 309 HI0608 636187 636987 serineacetyltransferase (cysE) {Escherichia coli} 73.0 88.3 256 Aromatic aminoacid family HI0972 1026936 1027382 2-dehydroquinase (aroO){Actinobacillus pleuropneumoniae} 67.1 82.5 143 HI0209 222169 2222543-dehydroquinate synthase (aroB) {Escherichia coli} 62.1 76.7 356 HI0197211424 212494 chorismate synthase (aroC) {Escherichia coli} 77.3 88.4350 HI0609 637000 637812 dehydroquinasa shikimate dehydrogenase{Nicotiana tabacum} 30.0 51.5 242 HI1595 1656463 1657758enolpyruvylshikimatephosphatesynthase (aroA) {Haemophilus influenzae}97.7 98.4 432 HI0657 698929 698124 shikimate 5-dehydrogenase (aroE){Escherichia coli} 49.1 70.1 270 HI0208 221607 222146 shikimic acidkinase I (aroK) {Escherichia coli} 75.0 87.5 104 HI1148 1213767 1214921chorismate mutase/prephenate dehydralase pheA polypeptide (pheA) 54.374.7 375 {Escherichia coli} HI1553 1618339 1617254 DAHP synthetase(phenylalanine repressible) (aroG) {Escherichia coli} 72.0 83.8 345HI1293 1370448 1371578 chorismate mutase (tyrA) {Erwinia herbicola} 58.676.8 366 HI1392 1481917 1483470 anthranilate synthase component I (trpE){Escherichia coli} 52.9 73.2 494 HI1393 1483718 1485554 anthranilatesynthase component II (trpD) {Escherichia coli} 56.6 74.2 452 HI11741240757 1241335 anthranilate synthase glutamine amidotransferase (trpG){Acinetobacter 34.0 59.0 191 calcoaceticus} HI1437 1519794 1520597tryptophan synthase alpha chain (trpA) {Salmonella typhimurium} 57.872.8 267 HI1436 1518601 1519791 tryptophan synthase, beta chain (trpB){Escherichia coli} 82.4 90.3 391 HI0474 494758 495354 amidotransferase(hisH) {Escherichia coli} 55.9 70.3 195 HI0470 490033 490941 ATPphosphoribosyltransferase (hisG) {Escherichia coli} 72.2 82.0 295 HI0476496124 496897 hisF cyclase (hisF) {Escherichia coli} 82.0 91.0 256HI0472 492389 493489 histidinol-phosphate aminotransferase (hisC){Escherichia coli} 60.1 77.5 351 HI1169 1237411 1236314histidinol-phosphate aminotransferase (hisH) {Bacillus subtllis} 38.761.0 354 HI0473 493604 494689 imidazoleglycerol-phosphate dehydratase(hisS) {Escherichia coli) 65.0 80.5 353 HI0477 496900 497562phosphoribosyl-AMP cyclohydrolase (hisIE) {Escherichia coli} 60.7 77.0195 HI0475 495393 496139 phosphoribosylformimino-5-aminoimidazolecarboxamide ribotide isomerase 62.9 77.1 245 (hisA) {Escherichia coli}Pyruvate family HI1581 1642613 1643692 alanine racemase, biosynthetic(alr) {Escherichia coli} 56.3 74.9 358 Branched chain family HI0739791174 791968 acetohydroxy acid synthase II (ilvG) {Escherichia coli}63.6 78.5 386 HI1591 1652923 1651205 acetolactate synthase III largechain (ilvI) {Escherichia coli} 69.1 83.9 527 HI1590 1651202 1650714acetolactate synthase III small chain (ilvH) {Escherichia coli} 65.685.0 160 HI1196 1259031 1258003 branched-chain-amino-acid transaminase{Salmonella typhimurium} 32.9 49.8 298 HI0740 791969 793960dihydroxyacid dehydrase (ilvD) {Escherichia coli} 77.9 89.5 614 HI0684723320 724795 ketol-acid reductoisomerase (ilvC) {Escherichia coli} 81.789.6 491 HI0991 1047074 1047673 3-isopropylmalate dehydratase(isopropylmalate isomerase) (leuD) 71.1 86.3 197 {Salmonellatyphimurium} HI0989 1044390 1045463 3-isopropylmalate dehydrogenase(beta-IPM dehydrogenase) (leuB) 68.0 80.1 353 {Salmonella typhimurium}HI0985 1040319 1039678 leuA protein (leuA) {Haemophilus influenzae} 99.5100.0 193 Biosynthesis of cofactors, prosthetic groups, carriers BiotinHI1560 1625092 1623803 7,8-diamino-pelargonic acid aminotransferase(bioA) {Escherichia coli} 58.0 74.1 420 HI1559 1623791 16226527-keto-8-aminopelargonic acid synthetase (bioF) {Bacillus sphaericus}33.5 56.3 370 HI1557 1622004 1621225 biotin biosynthesis: reaction priorto pimeloyl CoA (bioC) {Escherichia coli} 28.6 46.8 151 HI0645 687346684872 biotin sulfoxide reductase (BDS reductase) (bisC) {Escherichiacoli} 54.0 71.8 734 HI1024 1085538 1086535 biotin synthetase (bioB){Escherichia coli} 59.6 77.5 307 HI1556 1621212 1620640 dethiobiotinsynthase (bioD) {Bacillus sphaericus} 42.1 59.6 175 HI1449 15329321532207 dethiobiotin synthetase (bioD) {Escherichia coli} 41.3 62.4 217Folic acid HI1448 1531237 1532112 5,10 methylenetetrahydrofolatereductase (metF) {Escherichia coli} 72.8 83.4 290 HI0611 640325 6394805,10-methylene-tetrahydrofolate dehydrogenase (folD) {Escherichia coli}67.6 82.0 278 HI0064 67257 677607,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase (folK) {Escherichia56.3 77.8 158 coli} HI0459 478432 477392 aminodeoxychorismate lyase(pabC) {Escherichia coli} 40.1 66.5 243 HI1635 1691986 1691351 dedAprotein (dadA) {Escherichia coli} 30.4 55.1 158 HI0901 955417 954938dehydrofolate reductase, type I (folA) {Escherichia coli} 53.2 68.4 158HI1339 1412130 1412954 dihydropteroate synthase (folP) {Escherichiacoli} 54.5 70.9 275 HI1469 1547395 1548370 dihydropteroate synthase(folP) {Escherichia coli} 54.5 70.9 275 HI1264 1337544 1338854folylpolyglutamate synthase (folC) {Escherichia coli} 51.7 68.4 409HI1451 1534018 1533365 GTP cyclchydrolase I (folE) {Escherichia coli}63.9 79.0 219 HI1173 1240715 1239732 p-aminobenzoate synthetase (pabB){Escherichia coli} 31.0 53.6 257 Lipoate HI0026 28610 27651 lipoatebiosynthesis protein A (lipA) {Escherichia coli} 73.8 84.1 321 HI002729302 28667 lipoate biosynthesis protein B (lipB) {Escherichia coli}66.7 84.2 181 Molybdopterin HI1681 1743523 1743044 moaC protein (moaC){Escherichia coli} 79.1 89.2 157 HI1682 1744628 1743618 molybdenumcofacter biosynthesis pretein A (moaA) {Escherichia coli} 61.8 78.3 327HI1373 1461582 1461376 molybdenum-pterin binding protein (mopl){Clostridium pasteurianum} 51.5 74.2 66 HI1680 1743078 1742797molybdopterin (MPT) converting factor, subunit 1 (moaD) {Escherichiacoli} 59.3 79.0 81 HI1452 1534156 1535367 molybdopterin biosynthesispretein (chIE) {Escherichia coli} 56.4 72.5 403 HI0118 132351 133133molybdopterin biosynthesis protein (chIN) {Escherichia coli} 27.9 52.9135 HI1453 1535374 1536102 molybdopterin biosynthesis protein (chIN){Escherichia coli} 63.9 78.4 241 HI1679 1742793 1742344 molybdopterinconverting factor, subunit 2 (moaE) {Escherichia coli} 58.0 76.0 150HI0846 892779 892204 molybdopterin-guanine dinucleotide (mob){Escherichia coli} 39.4 61.7 187 Panthothenate HI0633 670462 669530antothenate kinase (coaA) {Escherichia coli} 64.1 78.2 314 PyridoxineHI0865 913165 913851 pyridoxamine phosphate oxidase (pdxH) {Escherichiacoli} 46.0 65.3 213 Riboflavin HI0766 827249 8278933,4-dihydroxy-2-butanone 4-phosphate synthase (ribB) {Escherichia coli}69.6 82.7 213 HI0213 225991 226662 GTP cyclohydrolase II (ribA){Escherichia coli} 68.0 81.4 193 HI0946 1002768 1003883 riboflavinbiosynthesis protein RIBG (ribD) {Escherichia coli} 57.6 76.5 361 HI16191678899 1679510 riboflavin synthase alha chain (ribC) {Escherichia coli}65.5 82.3 203 HI1306 1382553 1383071 riboflavin synthase beta chain(ribE) {Escherichia coli} 76.3 89.7 156 Thioredoxin, glutaredoxin,glutathione HI0162 177496 176129 glutathinone reductase (gor){Escherichia coli} 74.2 85.0 450 HI1118 1181697 1181197 thioredoxin(trxA) (Anabaena sp.) 36.6 58.5 82 HI1162 1228652 1228002 thioredoxin(trxA) (Anabaena sp.) 33.3 61.5 39 HI0084 88470 88150 thioredoxin m(trxM) {Anacystis nidulans} 53.3 79.4 107 Menaquinone, ubiquinone HI0285317765 316062 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylatesynthase (menD) 46.8 64.4 551 {Escherichia coli} HI0971 1025835 10268754-(2′-carboxyphenyl)-4-oxybutyric acid synthase (menC) {Escherichiacoli} 57.3 74.2 312 HI1192 1256548 1255916 coenzyme PQQ synthesisprotein III (pqqIII) {Acinetobacter calcoaceticus} 25.4 48.6 211 HI09701024963 1025817 DHNA synthase(menB) {Escherichia coli} 86.7 95.1 285HI1442 1525823 1526707 famesyldiphosphate synthase (ispA) {Escherichiacoli} 53.6 71.2 297 HI0195 206694 208049 o-succinylbenzoate-CoA synthase(menE) {Escherichia coli} 46.0 66.8 426 Heme, porphyrin HI1163 12299081228940 ferrochelatase (visA) {Escherichia coli} 51.6 69.4 315 HI0113119848 122079 heme utilization protein (hxuC) {Haemophilus infuenzae}26.4 46.1 695 HI0265 293930 295624 heme-hemopexin utilization (hxuB){Haemophilus influenzae} 98.1 98.9 565 HI0604 631034 629751 hemY protein(hemY) {Escherichia coli}38.9 64.4 355 HI0465 484621 485769oxygen-independent coproporphynnogen III oxidase (hemN) {Salmonella 31.552.3 241 typhimurium} HI1204 1267418 1266477 protoporphyrinogen oxidase(hemG) {Escherichia coli} 36.1 56.8 153 HI1565 1629849 1628974protoporphyrinogen oxidase (hemG) {Escherichia coli} 59.1 72.6 203HI0605 631035 632562 uroporphyrinogen III methylase (hemX) {Escherichiacoli} 39.9 60.3 358 Cell envelope Membranes, lipoproleins, porins HI15851647711 1647247 15 kd peptidoglycan-associated lipoprotein (lpp){Haemophilus influenzae} 94.8 95.5 154 HI0622 653682 652864 28 kDamembrane protein (hlpA) {Haemophilus influenzae} 99.6 100.0 273 HI0304335684 337249 apolipoprotein N-acyltransferase (cute) {Escherichia coli}45.2 64.1 497 HI0362 384880 384035 hydrophobic membrane protein{Streptococcus gordonii} 37.2 66.5 268 HI0409 428260 427478 hydrophobicmembrane protein {Streptococcus gordonii} 34.4 61.3 254 HI1573 16345531636106 iron-regulated outer membrane protein A (iroA) {Neissenameningitidis} 28.9 50.9 398 HI0695 736825 737646 lipoprotein (hel){Haemophilus influenzae} 99.6 99.6 274 HI0707 749215 750429 lipoprotein(nlpD) {Escherichia coli} 48.6 64.8 364 HI0705 748419 748994 lipoproteinB (lppB) {Haemophilus somnus} 72.3 89.5 191 HI0896 946675 947916membrane fusion protein (mtrC) {Neisseria gonorrhoeae} 30.9 53.6 337HI0403 421547 422923 outer membrane protein P1 (ompP1) {Haemophilusinfluenzae} 93.0 97.2 459 HI0140 153446 154522 outer membrane protein P2(ompP2) {Haemophilus influenzae} 96.7 97.5 361 HI1167 1234699 1235757outer membrane protein P5 (ompA) {Haemophilus influenzae} 94.1 95.8 353HI0906 958098 958901 prolipoprotein diacylglyceryl transferase (lgt){Escherichia coli} 62.8 80.1 285 HI0030 31698 30838 rare lipoprotein A(rlpA) {Escherichia coli} 34.5 57.8 288 HI0924 979182 979727 rarelipoprotein B (rlpB) {Escherichia coli} 33.5 62.1 163 Surfacepolysaccharides, lipopolysaccharides & antigens HI1563 1628153 16273022-dehydro-3-deoxyphosphooctonate aldolase (kdsA) {Escherichia coli} 81.391.5 283 HI0654 696743 695463 3-deoxy-d-manno-octulosonic-acidtransferase (kdtA) {Escherichia coli} 50.7 69.9 420 HI1108 11691761168139 AOP-heptose-lps heptosyltransferase II (rfaF) {Escherichia coli}63.6 78.9 345 HI1117 1181141 1180218ADP-L-glycero-D-mannoheptose-6-epimerase (rfaD) {Escherichia coli} 78.287.7 308 HI0058 59659 58898CTP:CMP-3-deoxy-D-manno-octulosonate-cytidylyl-transferase (kdsB) 65.081.7 245 {Escherichia coli} HI0917 970233 969211 firA protein (firA){Pasteurella multocida} 84.9 91.1 338 HI0870 919974 920723 glycosyltranaferase (lgtD) {Neisseria gonorrhoeae} 30.3 55.3 200 HI1584 16460901647058 glycosyl transferase (lgtD) {Neisseria gonorrhoeae} 47.3 64.0328 HI0653 695463 694996 KDTB protein (kdtB) {Escherichia coli} 52.375.8 153 HI1684 1746281 1747291 kpsF protein (kpsF) {Escherichia coli}49.3 70.6 294 HI1543 1607986 1608967 lic-1 operon protein (licA){Haemophilus influenzae} 99.1 100.0 321 HI1544 1608970 1609885 lic-1operon protein (licB) {Haemophilus influenzae} 99.0 99.3 303 HI15451609845 1610543 lic-1 operon protein (licC) {Haemophilus influenzae}96.5 99.5 198 HI1546 1610546 1611340 lic-1 operon protein (licD){Haemophilus influenzae} 88.7 94.0 268 HI1062 1125450 1124254 lipid Adisaccharide synthetase (lpxB) {Escherichia coli} 63.2 77.3 382 HI0552571001 570096 lipooligosaccharide biosynthesis protein {Haemophilusinfluenzae} 98.3 99.0 298 HI0767 827911 828756 lipooligosaccharidebiosynthesis protein {Haemophilus influenzae} 36.4 59.5 267 HI0869918779 919990 lsg locus hypothetical protein (GB:M94855_1) {Haemophilusinfluenzae} 60.5 82.5 400 HI1706 1770127 1768916 lsg locus hypotheticalprotein (GB:M94855_1) {Haemophilus influenzae} 99.3 100.0 401 HI17051768916 1768005 lsg locus hypothetical protein (GB:M94855_2){Haemophilus influenzae} 98.4 98.7 304 HI1704 1768000 1767322 lsg locushypothetical protein (GB:M94855_3) {Haemophilus influenzae} 96.0 97.4226 HI1703 1766957 1766157 lsg locus hypothetical protein (GB:M94855_4){Haemophilus influenzae} 96.1 98.4 257 HI1702 1786142 1765261 lsg locushypothetical protein (GB:M94855_5) {Haemophilus influenzae} 96.9 98.3294 HI1701 1765256 1764456 lsg locus hypothetical protein (GB:M94855_6){Haemophilus influenzae} 98.9 99.3 267 HI1700 1763577 1764341 lsg locushypothetical protein (GB:M94855_7) {Haemophilus influenzae} 98.4 98.4255 HI1699 1763439 1762678 lsg locus hypothetical protein (GB:M94855_8){Haemophilus influenzae} 98.6 99.0 209 HI0263 290317 291357 opsX locusprotein (opsX) {Xanthomonas campestris} 35.2 56.7 261 HI1722 17885471787483 rfe (CGSC No 294) protein {Escherichia coli} 59.0 77.2 344HI1147 1212723 1213637 UDP-3-0-acyl N-acetylglcosamine deacetylase(envA) {Escherichia coli} 77.3 88.2 304 HI1063 1126278 1125493UDP-N-acetylglucosamine acetyltransferase (lpxA) {Escherichia coli} 66.079.4 262 HI0875 925083 926096 UDP-N-acetylglucosamine epimerase (rtfE){Escherichia coli} 65.5 79.5 336 HI0874 923609 925021undecaprenyl-phosphate galactosephosphotransferase (rtbP) {Salmonella57.9 75.1 465 typhimurium} Surface structures HI1738 1808251 1804281adhesin (aidA-I) {Escherichia coli} 29.3 45.8 1196 HI0119 133314 134324adhesin B precursor (timA) {Streptococcus parasanguis} 24.5 48.3 309HI0364 386685 385607 adhesin B precursor (timA) {Streptococcusparasanguis} 34.6 61.6 302 HI0332 356770 358062 cell envelope protein(oapA) {Haemophilus influenaze} 99.8 100.0 431 HI0713 757120 757425flagellar switch protein (fliM) {Salmonella typhimurium} 34.1 61.0 41HI1464 1542848 1542296 invasin precursor (outer membrane adhesin) (yopA){Yersinia enterocolitica} 38.5 62.1 291 HI0333 358125 358526 opacityassociated protein (oapB) {Haemophilus influenzae} 99.2 99.2 132 HI0416436627 436836 opacity protein (opa66) {Neisseria gonorrhoeae} 74.5 90.955 HI1177 1243585 1243947 opacity protein (opa66) {Neisseriagonorrhoeae} 37.7 59.0 181 HI1461 1540805 1540272 opacity protein (opaD){Neisseria meningitidis} 34.5 55.8 230 HI0300 333052 331661 pilinbiogenesis protein (pilB) {Pseudomonas aeruginosa} 44.1 64.8 485 HI0919973373 970950 protective surface antigen D15 {Haemophilus influenzae}98.6 99.5 797 Murein sacculus, peptidoglycan HI1674 1737564 1735481carboxy-terminal protease, penicillin-binding protein 3 (prc){Escherichia 52.3 69.5 660 coli} HI1143 1208355 1209272D-alanine-D-alanine ligase (ddIB) {Escherichia coli} 59.9 75.8 303HI1333 1408286 1406850 D-alanyl-D-alanine carboxypeptidase (dacB){Escherichia coli} 43.9 68.2 454 HI0066 68323 69618N-acetylmuramoyl-L-alanine amidase (amiB) {Escherichia coli} 59.5 77.0221 HI0383 401990 401532 PC protein (15kd peptidoglycan-associated outermembrane lipoprotein) 100.0 100.0 153 (pal) {Haemophilus influenzae}HI1731 1795566 1797908 penicillin-binding protein 1B (ponB) {Escherichiacoli} 47.0 67.5 767 HI0032 34810 32858 penicillin-binding protein 2(pbp2) {Escherichia coli} 58.8 73.8 609 HI0029 30819 29641penicillin-binding protein 5 (dacA) {Escherichia coli} 54.8 68.4 362HI0198 212582 213439 penicillin-insensitive murein endopeptidase (mepA){Escherichia coli} 49.3 66.7 269 HI1138 1201927 1203006phospho-N-acetylmuramoyl-pentapeptide-transferas E (mraY) {Escherichia76.7 88.9 360 coli} HI0038 40689 41741 rod shape-determining protein(mreC) {Escherichia coli} 50.3 74.5 293 HI0031 32865 31753 rodshape-determining protein (mreB) {Escherichia coli} 63.1 80.7 358 HI003739473 40606 rod shape-determining protein (mreB) {Escherichia coli} 79.689.9 347 HI0039 41744 42229 rod shape-determining protein (mreD){Escherichia coli} 40.6 71.6 154 HI0831 878792 880570 soluble lyticmurein transglycosylase (slt) {Escherichia coli} 40.4 59.3 378 HI11411205663 1206715 transterase, peptidoglycan synthesis (murG) {Escherichiacoli} 61.7 76.0 350 HI1137 1200560 1201930 UDP-mumac-pentapeptidesynthetase (murE) {Escherichia coli} 51.4 68.2 452 HI1136 11990801200543 UDP-MurNac-tripeptide synthetase (murE) {Escherichia coli} 55.772.6 463 HI0270 301245 302267 UDP-N-acetylenolpynivoylglucosaminereductase (murB) {Escherichia coli} 57.6 75.6 340 HI1083 1148434 1147163UDP-N-acetylglucosamine enolpyruvyl transferase (murZ) {Escherichiacoli} 72.4 84.5 419 HI1142 1206856 1208280 UDP-N-acetylmuramate-alanineligase (murC) {Escherichia coli} 68.2 81.8 470 HI1139 1203132 1204442UDP-N-acetylmuramoylalanine-D-glutamate ligase (murD) {Escherichia coli}61.0 73.7 437 HI1499 1569479 1569826 N-acetylmuramoyl-L-alanine amidase(Bacteriophage T3) 42.9 62.2 97 Central intermediary metabolismPhosphorus compounds HI0697 739608 738640 exopolyphosphatase (ppx){Escherichia coli} 55.2 76.7 318 HI0124 139861 139334 inorganicpyrophosphatase (ppa) {Escherichia coli} 36.3 50.3 157 HI0647 689574688637 lysophospholipase L2 (pldB) {Escherichia coli} 31.2 53.1 317Sulfur metabolism HI1374 1462019 1461693 desultoviridin gamma subunit(dsvC) (Desulfovibrio vulgaris} 36.0 58.0 99 HI0807 854438 853741putative arylsulfatase regulatory protein (aslB) {Escherichia coli} 47.467.0 381 HI0561 578539 577856 sulfite synthesis pathway protein (cysQ){Escherichia coli} 35.9 56.0 205 Polyamine biosynthesis HI0099 1063071107374 nucleotide binding protein (potG) {Escherichia coli} 42.6 66.9340 HI0593 614187 1612028 omithine decarboxylase (speF) {Escherichiacoli} 66.4 80.2 717 Polysaccharides - (cytoplasmic) HI1360 14361701438359 1,4-alpha-glucan branching enzyme (glgB) {Escherichia coli} 64.580.1 723 HI1362 1440427 1441758 ADP-glucose synthetase (glgC){Escherichia coli} 55.0 74.3 407 HI1364 1443545 1446007 alpha-glucanphosphorylase (glgP) {Escherichia coli} 61.1 79.1 809 HI1361 14384581440434 glycogen operon protein (glgX) {Escherichia coli} 54.3 67.8 501HI1363 1441869 1443296 glycogen synthase (glgA) {Escherichia coli} 56.271.2 475 Degradation of polysaccharides HI1359 1434061 1436157amylomaltase (malQ) {Escherichia coli} 40.9 62.0 615 HI1420 15076621507063 endochitinase {Oryza sativa} 38.9 50.9 106 Amino sugars HI0431452989 451160 glutamine amidotransferase (glmS) {Escherichia coli} 72.184.3 609 HI0141 155859 154717 N-acetylglucosamine-6-phosphatedeacetylase (nagA) {Escherichia coli} 54.5 72.1 376 HI0142 156944 156135nagB protein (nagB) {Escherichia coli} 74.2 88.1 260 Other HI0048 4925748403 7-alpha-hydroxysteroid dehydrogenase (hdhA) {Escherichia coli}32.4 55.1 244 HI1207 1271536 1270334 acetate kinase (ackA) {Escherichiacoli} 69.1 83.9 396 HI0951 1009728 1008367 GABA transaminase (gabT){Escherichia coli} 34.4 55.8 420 HI0111 118858 119484 glutathionetransferase (bphH) {Pseudomonas sp.} 37.6 57.4 200 HI0693 734488 735996glycerol kinase (glpK) {Escherichia coli} 76.9 89.2 502 HI0586 606429605161 hippuricase (hipO) {Campylobacter jejuni} 27.8 49.6 376 HI0543564874 564575 urease (ureA) {Helicobacter heilmannii} 62.4 76.2 101HI0539 561668 561087 urease accessory protein (UreF) {Bacillus sp.} 31.854.9 194 HI0541 564179 562464 urease alpha subunit (urea amidohydrolase)(ureC) {Bacillus sp.} 67.3 62.1 569 HI0540 562333 561779 urease protein(ureE) {Helicobacter pylori} 31.0 56.8 155 HI0538 560981 560307 ureaseprotein (ureG) {Helicobacter pylori} 70.7 86.9 198 HI0537 560229 559447urease protein (ureH) {Helicobacter pylori} 31.5 53.9 213 HI0542 564180564574 urease subunit B (ureB) {Escherichia coli} 61.8 77.5 103 Energymetabolism Amino acids, amines HI0536 559266 557842 aspartase (aspA){Escherichia coli} 78.2 89.1 468 HI0597 617739 616810 carbamate kinase(arcC) {Pseudomonas aeruginosa} 78.3 87.7 309 HI0747 802651 803697L-asparaginase II (ansB) {Escherichia coli} 70.5 81.2 329 HI0290 323270321907 L-serine deaminase (sdaA) {Escherichia coli} 68.6 83.3 454 SugarsHI0820 869307 868288 aldose 1-epimerase precursor (mutarotase) (mro){Acinetobacter 36.8 54.7 326 calcoaceticus} HI0055 55016 56197D-mannonate hydrolase (uxuA) {Escherichia coli} 72.8 85.8 394 HI11191181808 1182476 deoxyribose aldolase (deoC) {Mycoplasma hominis} 49.068.5 200 HI0615 644708 643299 fucokinase (fucK) {Escherichia coli} 41.164.5 459 HI0613 642828 642181 fuculose-1-phosphate aldolase (fucA){Escherichia coli} 64.7 81.4 215 HI1014 1075981 1076610fuculose-1-phosphate aldolase (fucA) {Escherichia coli} 32.9 51.8 163HI0821 870510 869320 galactokinase (galK) {Haemophilus influenzae} 98.499.0 384 HI0145 159883 158984 glucose kinase (glk) {Streptomycescoelicolor} 33.6 53.2 303 HI0616 646595 644784 L-fucose isomerase (fucl){Escherichia coli} 69.5 84.5 583 HI1027 1090247 1089519L-ribulose-phosphate 4-epimerase (araD) {Escherichia coli} 72.3 81.8 231HI1111 1173107 1171938 mal inducer biosynthesis blocker (malY){Escherichia coli} 28.1 51.6 375 HI0143 158111 157233N-acetylneuraminate lyase (nanA) {Escherichia coli} 36.2 61.4 291 HI0507521330 522247 ribokinase (rbsK) {Escherichia coli} 56.0 74.8 302 HI11151177307 1178623 xylose isomerase (xylA) {Escherichia coli} 71.3 87.2 439HI1116 1178629 1180161 xylulose kinase (xylulokinase) {Escherichia coli}33.1 50.0 479 Glycolysis HI0449 470280 469342 1-phosphofructokinase(fruK) {Escherichia coli} 55.4 74.1 304 HI0984 1039579 10386176-phosphofructokinase (pfkA) {Escherichia coli} 74.4 84.4 319 HI0934990636 989329 enolase (eno) {Bacillus subtilis} 65.9 78.5 413 HI0526547668 546592 fructose-bisphosphate aldolase (fba) {Escherichia coli}71.3 85.8 359 HI1582 1643750 1645438 glucose-6-phosphate isomerase (pgi){Escherichia coli} 76.9 88.7 548 HI0001 1 600 glyceraldehyde-3-phosphatedehydrogenase (gapdH) {Escherichia coli} 85.8 90.3 133 HI0527 548939547782 phosphoglycerate kinase (pgk) {Escherichia coli} 81.1 90.7 387HI0759 820852 821533 phosphoglyceromutase (gpmA) {Zymomonas mobilis}58.9 74.6 222 HI1579 1639619 1641052 pyruvate kinase type II (pykA){Escherichia coli} 77.2 87.5 480 HI0680 719664 720452 triosephosphateisomerase (tpiA) {Escherichia coli} 74.4 80.7 253 Pyruvate dehydrogenaseHI1235 1303195 1301495 dihydrolipoamide acetyltransferase (aceF){Escherichia coli} 72.8 82.4 526 HI0194 206108 205248 dihydrolipoamideacetyltransferase (acoC) {Pseudomonas putida} 27.8 49.1 235 HI12341301378 1299945 lipoamide dehydrogenase (lpdA) {Escherichia coli} 81.591.6 474 HI1236 1305918 1303261 pyruvate dehydrogenase (aceE){Escherichia coli} 68.6 84.0 886 TCA cycle HI1668 1731748 17288992-oxoglutarate dehydrogenase (sucA) {Escherichia coli} 69.0 80.7 930HI0025 27397 26393 acetate:SH-citrate lyase ligase (AMP) {Klebsiellapneumoniae} 48.9 68.4 321 HI0022 25179 23680 citrate lyase alpha chain(acyl lyase subunit) (citF) {Klebsiella pneumoniae} 72.1 86.1 469 HI002326068 25457 catrate lyase beta chain (acyl lyase subunit) {Klebsiellapneumoniae} 62.3 81.9 203 HI0024 26352 26068 citrate lyase gamma chain(acyl lyase subunit) (citD) {Klebsiella 52.1 71.9 97 pneumoniae} HI16671728793 1727567 dihydrolipoamide succinyltransferase (sucB) {Escherichiacoli} 73.6 84.5 403 HI1403 1493925 1495316 fumarate hydratase class II(fumarase) (fumC) {Escherichia coli} 61.8 74.2 460 HI1213 12759071276839 malate dehydrogenase (mdh) {Escherichia coli} 78.5 85.1 303HI1248 1317431 1319698 malic acid enzyme {Bacillus stearothermophilus}49.5 68.3 376 HI1200 1262687 1263565 succinyl-CoA synthetasealpha-subunit (sucD) {Escherichia coli} 83.4 91.7 289 HI1199 12615181262684 succinyl-CoA synthetase beta-subunit (sucC) {Escherichia coli}64.7 80.2 388 Pentose phosphate pathway HI0555 574159 5727086-phosphogluconate dehydrogenase, decarboxylating (gnd) {Echerichia 54.071.1 464 coli} HI0560 577777 576296 glucose-6-phosphate 1-dehydrogenase(G6PD) {Synechococcus sp.} 46.2 65.3 483 HI1025 1088660 1086666transketolase 1 (TK 1) (tktA) {Escherichia coli} 77.1 87.5 664Entner-Doudoroff HI0047 48381 47746 2-keto-3-deoxy-6-phosphogluconatealdolase (eda) {Escherichia coli} 37.3 63.2 193 HI0049 50201 492602-keto-3-deoxy-D-gluconate kinase (kdgK) {Erwinia chrysanthemi} 44.264.5 300 Aerobic HI1655 1715678 1713987 D-lactate dehydrogenase (dld){Escherichia coli} 59.5 77.7 560 HI1166 1234330 1231250 D-lactatedehydrogenase (dld) {Saccharomyces cerevisiae} 27.6 47.7 427 HI0607635168 636172 glycerol-3-phosphate dehydrogenase (gpsA) {Escherichiacoli} 66.6 81.5 335 HI0749 805382 806713 NADH dehydrogenase (ndh){Escherichia coli} 57.8 75.4 430 Anaerobic HI1049 1112944 1110527anaerobic dimethyl sulfoxide reductase A (dmsA) {Escherichia coli} 74.086.3 765 HI1048 1110513 1109899 anaerobic dimethyl sulfoxide reductase B(dmsB) {Escherichia coli} 72.1 84.8 204 HI1047 1109894 1109058 anaerobicdimethyl sulfoxide reductase C (dmsC) {Escherichia coli} 41.0 65.0 287HI0646 688485 687382 cytochrome C-type protein (torC) {Escherichia coli}37.4 54.7 365 HI0350 374535 375134 denitrification system component(nirT) {Pseudomonas siutzeri} 51.7 71.6 176 HI0009 9878 10783 fdhEprotein (fdhE) {Escherichia coli} 50.8 71.6 307 HI0006 5067 8156 formatedehydrogenase, nitrate-inducible major subunit (fdnG) {Escherichia 64.479.2 1016 coli} HI0005 4802 3993 formate dehydrogenase-N affector (fdhD){Escherichia coli} 57.7 71.0 249 HI0008 9035 9805 formatedehydrogenase-O gamma subunit (fdol) {Escherichia coli} 52.8 72.1 195HI0007 8161 9096 formate dehydrogenase-O beta subunit (fdoH){Escherichia coli} 72.2 85.6 297 HI1071 1133439 1131826formate-dependent nitrite reductase (cytochrome C552) (nrfA) 56.7 75.3450 {Escherichia coli} HI1070 1131779 1131102 formate-dependent nitritereductase (nrfB) {Escherichia coli} 50.0 66.9 134 HI1069 1131102 1130428female-dependent nitrite reductase protein Fe—S centers (nrfC) 64.2 81.2217 {Escherichia coli} HI1068 1130428 1129466 formate-dependent nitritereductase transmembrane protein (nrfD) 48.2 68.4 312 {Escherichia coli}HI0835 882094 882529 fumarate reductase (frdC) {Escherichia coli} 49.272.3 129 HI0834 882093 881752 fumarate reductase 13 kDa hydrophobicprotein (frdD) {Escherichia coli} 53.0 76.5 119 HI0837 885089 883293fumarate reductase, flavoprotein subunit (frdA) {Escherichia coli} 75.467.2 602 HI0836 883357 882530 fumarate reductase, iron-sulfur protein(frdB) {Escherichia coli} 75.5 85.3 244 HI0681 720855 720541 glpEprotein (glpE) {Escherichia coli} 43.3 63.5 103 HI0620 651184 651759glpG protein (glpG) {Eschericia coli} 39.1 64.8 178 HI0687 729180 727492glycerol-3-phosphate dehydrogenase, subunit A (glpA) {Escherichia coli}69.9 82.7 531 HI0686 727529 726204 glycerol-3-phosphate dehydrogenase,subunit B (glpB) {Escherichia coli} 42.3 60.3 414 HI0685 726189 724912glycerol-3-phosphate dehydrogenase, subunit C (glpC) {Escherichia coli}58.8 76.0 393 HI1395 1487087 1487358 hydrogenase isoenzymes formationprotein (hypC) {Escherichia coli} 63.2 81.6 76 Electron transport HI0887936816 938552 C-type cytochrome biogenesis protein (copper tolerance)(cycZ) 48.8 67.7 557 {Escherichia coli} HI1078 1141318 1139756cytochrome oxidase d subunit I (cydA) {Escherichia coli} 64.3 82.4 515HI1077 1139738 1138605 cytochrome oxidase d subunit II (cydB){Escherichia coli} 60.9 78.4 379 HI0529 549872 550341 ferredoxin (fdx){Chromatium vinosum} 59.5 77.2 78 HI0374 394564 394226 ferredoxin (fdx){Escherichia coli} 64.5 83.6 110 HI0192 205148 204627 flavodoxin (fldA){Escherichia coli} 76.9 87.3 173 HI1365 1446272 1447807 NAD(P)transhydrogenase subunit alpha (pntA) {Escherichia coli} 73.7 84.1 509HI1366 1447821 1449242 NAD(P) transhydrogenase subunit beta (pntB){Escherichia coli} 80.5 87.7 462 HI1281 1355273 1354614 NAD(P)H-flavinoxidoreductase {Vibrio fischeri} 33.3 54.8 211 Fermentation HI0501514365 515657 aldehyde dehydrogenase (aldH) {Escherichia coli} 41.2 61.8236 HI0776 636764 836114 butyrate-acetoacetate coa-transferase subunit A(ctfA) {Clostridium 53.3 75.2 214 acetobutylicum} HI0186 200017 198884glutathione-dependent formaldehyde dehydrogenase (gd-faldH) {Paracoccus58.5 77.6 375 denitrificans} HI1308 1383529 1384563 hydrogenase generegion (hypE) {Alcaligenes eutrophus} 28.1 48.2 237 HI1642 16981961700833 phosphoenolpyruvate carboxylase (ppc) {Escherichia coli} 64.880.0 883 HI0181 193936 191621 pyruvate formate-lyase (pfl) {Escherichiacoli} 86.1 92.9 760 HI0180 191487 190750 pyruvate formate-lyaseactivating enzyme (act) {Escherichia coli} 74.0 85.4 246 HI1435 15178261518581 short chain alcohol dehydrogenase (ORFB) {Dichelobacter nodosus}51.9 69.2 104 Gluconeogenesis HI1651 1709919 1710917fructose-1,6-bisphosphatase (fbp) {Escherichia coli} 70.5 84.0 331HI0811 8590381 857425 phosphoenolpyruvate carboxykinase (pckA){Escherichia coli} 71.7 83.0 444 ATP-proton motive force interconversionHI0486 504824 504573 ATP synthase C chain (atpE) {Vibrio alginolyticus}62.7 81.9 83 HI0487 505668 504883 ATP synthase F0 subunit a (atpB){Escherichia coli} 58.2 78.1 261 HI0485 504520 504053 ATP synthase F0subunit b (atpF) {Escherichia coli} 63.5 79.5 156 HI0483 503491 501953ATP synthase F1 alpha subunit (alpA) {Escherichia coli} 86.5 94.7 513HI0481 501081 499678 ATP synthase F1 beta subunit (atpD) {Escherichiacoli} 89.3 96.1 460 HI0484 504037 503507 ATP synthase F1 delta subunit(atpH) {Escherichia coli} 58.0 78.4 176 HI0480 499645 499220 ATPsynthase F1 epsilon subunit (atpC) {Escherichia coli} 59.6 75.7 136HI0482 501934 501068 ATP synthase F1 gamma subunit (atpG) {Escherichiacoli} 65.3 83.0 287 HI1277 1349508 1350221 ATP synthase subunit 3 regionprotein (atp) {Rhodopseudomonas blastica} 31.9 50.0 237 Fattyacid/phospholipid metabolism HI077 3834230 832896 acetyl coenzyme Aacetyltransferase (thiolase) (fadA) {Clostridium 63.0 80.4 391acetobutylicum} HI0428 448891 448169 fadR protein involved in fatty acidmetabolism (fadR) {Escherichia coli} 47.4 68.4 234 HI1064 11267381126295 (3R)-hydroxymyristol acyl carrier protein dehydrase (fabZ){Escherichia coli} 68.1 85.1 141 HI0156 171552 170827 3-ketoacyl-acylcarrier protein reductase (fabG) {Escherichia coli} 73.4 88.4 241 HI0408427385 426441 acetyl-CoA carboxylase (accA) {Escherichia coli} 75.3 88.3318 HI0155 170568 170341 acyl carrier protein (acpP) {Escherichia coli}82.7 90.7 75 HI0076 82175 83032 acyl-CoA thioesterase II (tesB){Escherichia coli} 52.3 73.1 283 HI1539 1605754 1604537beta-ketoacyl-ACP synthase I (fabB) {Escherichia coli} 72.8 83.7 403HI0158 174085 173138 beta-ketoacyl-acyl carrier protein synthase III(fabH) {Escherichia coli} 65.9 79.8 317 HI0973 1027538 1028002 biotincarboxyl carrier protein (accB) {Escherichia coli} 71.2 82.7 156 HI09741028180 1029523 biotin carboxylase (accC) {Escherichia coli} 81.5 91.3448 HI1328 1404041 1404571 D-3-hydroxydecanoyl-(acyl carrier-protein)dehydratase (fabA) {Escherichia 79.2 91.7 168 coli} HI0337 362881 363234diacylglycerol kinase (dgkA) {Escherichia coli} 50.9 71.8 110 HI0002 6012421 long chain fatty acid coA ligase {Homo sapiens} 29.5 52.8 575HI0157 172507 171572 malonyl coenzyme A-acyl carrier proteintransacylase (fabD) {Escherichia 71.0 81.6 309 coli} HI1740 18115561810672 short chain alcohol dehydrogenase homolog (envM) {Escherichiacoli} 75.3 84.9 259 HI1438 1521691 1520741 USG-1 protein (usg){Escherichia coli} 32.7 53.9 334 HI0736 788371 7876521-acyl-glycerol-3-phosphate acyltransferase (plsC) {Escherichia coli}62.2 78.2 238 HI0291 975561 974698 CDP-diglyceride synthetase (cdsA){Escherichia coli} 48.4 66.5 246 HI0750 809228 806799glycerol-3-phosphate acyltransferase (pIsB) {Escherichia coli} 57.3 75.7804 HI0212 225946 225224 phosphatidylglycerophosphate phosphetase B(pgpB) {Escherichia coli} 35.7 60.3 220 HI0123 138207 138761phosphatidylglycerophosphate synthase (pgsA) {Escherichia coli} 66.583.0 182 HI0161 175145 176014 phosphatidylserine decarboxylase proenzyme(psd) {Escherichia coli} 57.6 75.5 280 HI0427 446754 448118phosphatidylserine synthase (pssA) {Escherichia coli} 49.2 70.8 452HI0691 732349 733440 protein D (hpd) {Haemophilus influenzae} 98.4 99.2364 Purines, pyrimidines, nucleosides and nucleotides Purineribonucleotide biosynthesis HI1622 1682920 16840055′-phosphonribosyl-5-amino-4-imidazole carboxylase II (purK){Escherichia 50.8 71.9 351 coli} HI1434 1517646 15166155′-phosphoribosyl-5-aminoimidazole synthetase (purM) {Escherichia coli}76.5 86.7 344 HI1749 1829283 1828660 5′guanylate kinase (gmk){Escherichia coli} 64.7 81.6 206 HI0351 375941 375300 adenylate kinase(ATP-AMP tranaphosphorylase) (adk) {Haemophilus 99.5 99.5 214influenzae} HI0641 679574 681094 adenylosuccinate lyase (purB){Escherichia coli} 76.5 87.9 456 HI1639 1694462 1695757 adenylosuccinatesynthetase (purA) {Escherichia coli} 75.7 87.3 432 HI1210 12727831274297 amidophosphoribosyltransferase (purF) {Escherichia coli} 89.184.0 504 HI0754 812369 816328 formylglycineamide ribonucleotidesynthetase (purL) {Escherichia coli} 69.7 82.0 1290 Hl1594 16556271656460 formylitetrahydrofolate hydrolase (purU) {Escherichia coli} 72.685.2 277 HI0223 250532 252100 guaA protein (guaA) {Escherichia coli}78.1 87.6 525 HO0222 248355 249818 inosine-5′-monophosphatedehydrogenase (guaB) {Acinetobacter 62.7 80.9 487 calcoaceticus} HI0878928811 929233 nucleoside diphosphate kinase (ndk) {Escherichia coli}63.0 73.9 138 HI0890 940953 942239 phosphonbosylamine-glycine ligase(purD) {Escherichia coli} 75.2 84.5 427 HI1621 1682355 1682847phosphoribosylaminoimidazole carboxylase catalytic subunit (purE) 94.496.9 161 {Haemophilus influenzae} HI0889 939259 940854phosphoribosylaminoimidazoledecarboxamide formyltransferase (purH) 77.286.5 525 {Escherichia coli} HI1433 1516557 1515922phosphoribosylglycinamide formyltransferase (purN) {Escherichia coli}51.9 71.4 210 HI1615 1674317 1675261 phosphoribosylpyrophosphatesynthetase (prsA) {Salmonella typhimurium} 84.1 91.1 314 HI1732 17980361798953 SAICAR synthetase (purC) {Streptococcus pneumoniae} 29.8 54.8204 Pyrimidine ribonucleotide biosyn HI1406 1497997 1496981dihydroorotate dehydrogenase (dihydroorotate oxidase) (pyrD){Escherichia 60.7 77.4 334 coli} HI0274 305799 305161 orotatephosphoribosyltransferase (pyrE) {Escherichia coli} 69.0 83.6 213 HI12281293965 1294282 pyrF operon encoding orotidine 5′-monophosphate (OMP)decarboxylase 77.1 87.6 105 {Escherichia coli} HI1227 1293266 1293955pyrF protein (pyrF) {Escherichia coli} 62.3 79.4 228 HI0461 480053479517 uracil phosphoribosyltransferase (pyrR) {Bacillus caldolyticus}52.2 73.9 179 2′-deoxyribonuclectide metabolism HI0075 79934 82054anaerobic ribonucleoside-triphosphate reductase (nrdD) {Escherichiacoli} 77.4 88.2 702 HI0133 146656 147240 deoxycytidine triphosphatedeaminase (dcd) {Escherichia coli} 75.6 86.5 193 HI0956 1012787 1013239deoxyuridinetriphosphatase (dut) {Escherichia coli} 75.5 90.7 151 HI15381604204 1604464 glutaredoxin (grx) {Escherichia coli} 69.9 79.5 83HI1666 1726318 1727445 nrdB protein (nrdB) {Escherichia coli} 85.4 92.6376 HI1665 1723831 1726173 ribonucleoside-diphosphate reductase 1 alphachain (nrdA) {Eschericia coli} 83.4 92.2 761 HI1161 1227925 1226972thioredoxin reductase (trxB) {Escherichia coli} 75.9 85.8 316 HI0907958914 959762 thymidylate synthetase (thyA) {Escherichia coli} 35.3 55.0264 Salvage of nucleosides and nuclectides HI0585 605064 6030942′,3′-cyclic-nucleotide-2′-phosphodiesterase (cpdB) {Escherichia coli}62.4 77.7 641 HI1233 1299794 1299255 adenine phosphoribosyltransferase(apt) {Escherichia coli} 66.1 83.1 177 HI0553 571120 571943adenosine-tetraphosphatase (apaH) {Escherichia coli} 52.4 73.1 271HI1353 1426390 1427265 cytidine deaminase (cytidine aminohydrolase)(cda) {Escherichia coli} 50.0 63.4 253 HI1222 1288579 1289628 cytidylatekinase (cmk) {Escherichia coli} 64.5 79.3 217 HI1652 1711636 1710842cytidylate kinase (cmk) {Escherichia coli} 63.5 76.8 202 HI0520 540879540166 purine-nucleoside phosphorylase (deoD) {Escherichia coli} 84.390.2 235 HI0531 552177 551599 thymidine kinase (tdk) {Escherichia coli}68.6 82.4 188 HI1231 1297050 1296427 uracil phosphoribosyltransferase(upp) {Escherichia coli} 83.2 93.8 208 HI0282 312879 313655 uridinephosphorylase (udp) {Escherichia coli} 72.0 84.8 250 HI0676 716559716095 xanthine guanine phosphoribosyl transferase gpt (xgprt){Escherichia coli} 72.1 87.7 152 HI0694 736541 736077 xanthine guaninephosphoribosyltransferase (xgprt) {Salmonella 74.0 87.7 152 typhimurium}HI1280 1353404 1354561 putative ATPase (mrp) {Escherichia coli} 66.079.0 353 Sugar-nucleotide biosynthesis, conversions HI0207 219511 2213195′-nucleotidase (ushA) {Homo sapiens} 34.5 54.8 487 HI1282 13553781356061 CMP-NeuNAc synthetase (siaB) {Neisseria meningitidis} 47.1 64.3221 HI0822 871597 870551 galactose-1-phosphate undylyltransferase (galT){Haemophilus influenzae} 99.1 100.0 349 HI0814 662632 861748glucosephosphate uridylyltransferase (galU) {Escherichia coli} 74.0 86.1287 HI0353 378461 377448 udp-glucose 4-epimerase (galactowaldenase)(galE) {Haemophilus 99.1 99.1 338 influenzae} HI0644 682446 683813UDP-N-acetylglucosamine pyrophosphorylase (glmU) {Escherichia coli} 68.683.1 456 Nucleotide and nucleoside interconversions HI1302 13767591378139 deoxyguanosine triphosphate triphosphohydrolase (dgt){Escherichia coli} 38.2 57.6 469 HI1079 1141970 1143603 pyrG protein(pyrG) {Escherichia coli} 80.4 90.5 545 HI0132 146006 146644 uridinekinase (uridine monophosphokinase) (udk) {Escherichia coli} 67.8 84.7202 Regulatory functions HI0606 632563 635091 adenylate cyclase (cyaA){Haemophilus influenzae} 100.0 100.0 843 HI0886 936624 935917 aerobicrespiration control protein ARCA (DYE resistance protein) (arcA) 77.287.8 237 {Escherichia coli} HI0221 238723 248354 aerobic respirationcontrol sensor protein (arcB) {Escherichia coli} 45.7 70.4 768 HI10541117872 1116979 araC-like transcription regulator {Streptomyceslividans} 25.7 47.7 303 HI1212 1275700 1275248 arginine repressorprotein (argR) {Escherichia coli} 69.1 81.2 149 HI0237 265657 265310arsC protein (arsC) (Plasmid R773) 38.3 56.5 11.4 HI0464 482094 484502ATP-dependent proteinase (Ion) {Escherichia coli} 74.5 87.9 769 HI0336360636 362863 ATP:GTP 3′-pyrophosphotransferase (relA) {Escherichiacoli} 62.9 80.5 741 HI1130 1193658 1195126 carbon starvation protein(catA) {Escherichia coli} 32.1 53.5 499 HI0815 862645 862657 carbonstorage regulator (csrA) {Escherichia coli} 68.4 91.2 57 HI0806 653619853063 cyclic AMP receptor protein (crp) {Haemophilus influenzae} 27.246.7 174 HI0959 1014161 1014832 cyclic AMP receptor protein (crp){Haemophilus influenzae} 100.0 100.0 224 HI1203 1265444 1266412 cysregulon transcriptional activator (cysB) {Escherichia coli} 63.3 79.3324 HI0191 204595 204158 ferric uptake regulation protein (fur){Escherichia coli} 61.4 75.0 139 HI1457 1537858 1537391 fimbrialtranscription regulation repressor (pilB) {Neisseria gonorrhoeae} 32.353.2 124 HI1459 1539614 1538556 fimbrial transcription regulationrepressor (pilB) {Neisseria gonorrhoeae} 59.0 72.6 325 HI1263 13366611337548 folylpolyglutamate-dihydrofolate synthetase expression regulator(accD) 69.5 82.5 290 {Escherichia coli} HI1430 1512975 1513745 fumarate(and nitrate) reduction regulatory protein (fnr) {Escherichia coli} 78.888.8 240 HI0823 871805 872800 galactose operon repressor (galS){Haemophilus influenzae} 99.1 99.4 332 HI0756 817661 818569 glucokinaseregulator {Rattus norvegicus} 31.8 56.1 512 HI0621 651792 652556glycerol-3-phosphate regulon repressor (glpR) {Escherichia coli} 61.577.4 252 HI1011 11073676 1073047 glycerol-3-phosphate regulon repressor(glpR) {Escherichia coli} 28.6 50.3 198 HI1197 1259493 1260395 glycinecleavage system transcriptional activator (gcvA) {Escherichia coli} 51.769.1 298 HI0013 13742 12837 GTP-binding protein (era) {Escherichia coli}77.9 87.0 299 HI0879 930478 929309 GTP-binding protein (obg) {Bacillussubtilis} 47.7 70.9 332 HI0573 592001 591099 hydrogen peroxide-inducibleactivator (oxyR) {Escherichia coli} 71.1 85.9 298 HI0617 647526 646780L-fucose operon activator (tucR) {Escherichia coli} 35.1 56.1 229 HI0401420131 420952 lacZ expression regulator (icc) {Escherichia coli} 52.971.3 261 HI0225 253133 253636 leucine responsive regulatory protein(lrp) {Escherichia coli} 29.6 52.6 152 HI1602 1663150 1662653 leucineresponsive regulatory protein (lrp) {Escherichia coli} 77.2 86.7 158HI0751 809477 810103 LEXA repressor (lexA) {Escherichia coli} 68.1 85.3202 HI1465 1542848 1542810 lipooligosaccharide protein (lex2A){Haemophilus influenzae} 44.4 66.7 9 HI1466 1542849 1543428lipooligosaccharide protein (lex2A) {Haemophilus influenzae} 50.0 66.748 HI0296 328190 327876 meIF aporepressor (metJ) {Escherichia coli} 81.993.3 105 HI14781 558154 1557312 molybdenum transport system alternativenitrogenase regulator (modD) 31.8 51.7 259 {Rhodobacter capsulalus}HI0200 214274 215227 msbB protein (msbB) {Escherichia coli} 45.3 67.0301 HI0411 429238 430662 msbB protein (msbB) {Escherichia coli} 50.969.3 284 HI0712 756824 757117 negative regulator of translation (relB){Escherichia coli} 28.3 48.3 60 HI0631 667822 668406 negative rporegulalor (mclA) {Escherichia coli} 40.1 62.9 199 HI0269 299532 301232nitrate sensor protein (narQ) {Escherichia coli} 38.6 63.0 555 HI0728778003 777380 nitrate/nitrite response regulator protein (narP){Escherichia coli} 59.6 79.3 205 HI0339 363915 364250 nitrogenregulatory protein P-II (glnB) {Escherichia coli} 77.7 93.8 112 HI17471828067 1826037 penta-phosphate guanosine-3′-pyrophosphohydrolase (spoT){Escherichia 58.8 76.6 675 coli} HI1381 1475017 1473741 phosphateregulon sensor protein (phoR) {Escherichia coli} 41.8 66.8 335 HI13821475709 1475017 phosphate regulon transcriptional regulatory protein(phoB) {Escherichia 52.9 71.8 227 coli} HI0765 827030 825768 probablenadAB transcriptional regulator (nadR) {Escherichia coli} 54.6 75.1 349HI1641 1697003 1698115 purine nucleotide synthesis repressor protein(purR) {Escherichia coli} 55.9 74.5 328 HI0164 178405 178713 putativemurein gene regulator (bolA) {Escherichia coli} 47.1 65.7 102 HI0508522278 523273 rbs repressor (rbsR) {Escherichia coli} 48.8 71.0 329HI0565 582225 581776 regulatory protein (asnG) {Escherichia coli} 68.081.0 147 HI1617 1677452 1676583 regulatory protein sfs1 involved inmaltose metabolism (sfsA) {Escherichia 54.3 71.2 218 coli} HI0895 946128946688 repressor for cytochrome P450 (Bm3R1) {Bacillus megaterium} 23.350.6 182 HI0271 302396 303238 RNA polymerase sigma-32 factor (heal shockregulatory protein F334) 70.8 86.8 281 (rpoH) {Escherichia coli} HI0535555646 557532 RNA polymerase sigma-70 factor (rpoD) {Escherichia coli}68.9 80.8 608 HI0630 667228 667794 RNA polymerase sigma-E factor (rpoE){Escherichia coli} 73.0 87.8 189 HI1713 1781137 1779785 sensor proteinfor basR (basS) {Escherichia coli} 30.0 55.7 253 HI1444 1529117 1528668stringent starvation protein (sspB) {Escherichia coli} 63.2 81.1 106HI1445 1529755 1529120 stringent starvation protein A (sspA){Haemophilus somnus} 76.9 87.3 212 HI1745 1815630 1814704trans-activator of metE and metH (metR) {Escherichia coli} 39.5 60.8 294HI0360 382477 383121 transcription activator (tenA) {Bacillus subtilis}27.8 48.3 208 HI0683 722643 721768 transcriptional activator protein(ilvY) {Escherichia coli} 47.4 70.3 293 HI1714 1781799 1781137transcriptional regulatory protein (basR) {Escherichia coli} 43.5 59.7216 HI0412 430780 431733 transcriptional regulatory protein (tyrR){Escherichia coli} 48.2 66.8 306 HI0832 880611 880913 tryptophanrepressor (trpR) {Enterobacter aerogenes} 39.8 67.0 88 HI0054 5418854985 uxu operon regulator (uxuR) {Escherichia coli} 50.0 72.1 246HI1109 1170415 1169255 xylose operon regluatory protein (xylR){Escherichia coli} 57.3 75.3 384 Replication DNA - replication,restr/modification, recombination HI0761 822003 823136 A/G-specsticadenine glycosylase (mutY) {Escherichia coli} 61.6 75.1 341 HI09951056674 1055313 chromosomal replication initiator protein (dnaA){Escherichia coli} 61.7 79.7 464 HI1229 1294415 1294317 chromosomalreplication initiator protein (dnaA) {Escherichia coli} 50.0 75.0 12HI0316 345720 345151 crossover junction endodeoxyribonuclease (ruvC){Escherichia coli} 78.5 88.3 163 HI0955 1011537 1012736 dfp protein(dfp) {Escherichia coli} 61.1 76.8 402 HI0210 223259 224116 DNA adeninemethylase (dam) {Escherichia coli} 55.4 71.4 266 HI1267 1343755 1341116DNA gyrase, subunit A (gyrA) {Escherichia coli} 70.6 84.9 859 HI0569587397 584980 DNA gyrase, subunit B (gyrB) {Escherichia coli} 74.7 85.9803 HI1191 1255302 1253122 DNA helicase II (uvrD) {Haemophilusinfluenzae} 96.8 97.5 727 HI1102 1162989 1160953 DNA ligase (lig){Escherichia coli} 63.7 79.9 666 HI0405 423539 424207 DNA mismatchprotein (mutH) {Escherichia coli} 60.4 80.7 212 HI0709 750565 753147 DNAmismatch repair protein (mutS) {Escherichia coli} 71.0 84.0 853 HI006769622 71508 DNA mismatch repair protein MUTL (mulL) {Escherichia coli}50.2 67.3 612 HI0858 904919 902130 DNA polymerase I (polA) {Escherichiacoli} 63.1 77.0 928 HI0994 1055297 1054200 DNA polymerase IIIbeta-subunit (dnaN) {Escherichia coli} 62.6 80.3 366 HI0457 476761475763 DNA polymerase III delta prime subunit (holB) {Escherichia coli}35.3 57.4 316 HI0925 979730 980761 DNA polymerase III delta subunit(holA) {Escherichia coli} 45.2 62.0 332 HI0138 152669 151902 DNApolymerase III epsilon subunit (dnaQ) {Escherichia coli} 61.3 76.5 236HI0741 799019 795544 DNA polymerase III, alpha chain (dnaE) {Escherichiacoli) 71.9 85.7 1159 HI1402 1493690 1493259 DNA polymerase III, chisubunit (holC) {Haemophilus influenzae} 98.9 98.9 88 HI0011 11672 11271DNA polymerase III, psi subunit (holD) {Escherichia coli} 34.4 73.8 571HI0534 553659 555645 DNA primase (dnaG) {Escherichia coli} 56.5 73.8 571HI1746 1826037 1823959 DNA recombinase (recG) {Escherichia coli} 66.580.1 693 HI0070 77166 75493 DNA repair protein (recN) {Escherichia coli}48.6 67.3 533 HI0659 699507 700058 DNA topoisomerase I (topA) {Bacillussubtilis} 34.2 55.0 110 HI0656 698124 697570 DNA-3-methyladenineglycosidase I (tagl) {Escherichia coli} 62.6 76.0 179 HI0730 779457781969 DNA-dependent ATPase, DNA helicase (recQ) {Escherichia coli} 62.977.6 589 HI0568 584860 584159 dod protein (dod) {Serratia marcescens}81.4 93.3 210 HI0062 65230 65664 dosage-dependent dnaK suppressorprotein (dksA) {Escherichia coli} 73.9 83.8 142 HI0948 1005798 1004986formamidopyrimidine-DNA glycosylase (fpg) {Escherichia coli} 57.6 74.7269 HI0584 602405 600519 glucose inhibited division protein (gidA){Escherichia coli} 76.1 87.3 627 HI0488 506816 506208 glucose inhibiteddivision protein (gidB) {Escherichia coli} 64.0 78.0 200 HI0982 10374961037792 Hin recombinational enhancer binding protein (fis) {Escherichiacoli} 81.6 92.9 97 HI0514 528338 527565 HincII endonuclease (HincII){Haemophilus influenzae} 98.4 98.4 258 HI1397 1491189 1490263 HindIIImodification methyltransferase (hindIIIM) {Haemophilus influenzae} 99.499.4 309 HI1398 1492072 1491173 HindIII restriction endonuclease(hindIlIR) {Haemophilus influenzae} 99.7 99.7 300 HI0315 345085 344474holliday junction DNA helicase (ruvA) {Escherichia coli} 58.8 79.9 203HI0314 344463 343459 holliday junction DNA helicase (ruvB) {Escherichiacoli} 80.9 90.0 330 HI0678 719064 718180 integrase/recombinase protein(xerC) {Escherichia coli} 58.0 74.4 293 HI1316 1391102 1391389integration host factor alpha-subunit (himA) {Escherichia coli}) 63.883.0 94 HI1224 1291400 1291681 integration host factor beta-subunit(IHF-beta) (himD) {Escherichia coli} 56.5 77.2 92 HI0404 422970 423539methylated-DNA--protein-cysteine methyltransferase (dat1) {Bacillus 40.161.7 163 subtilis} HI0671 713369 713806 mioC protein (mioC) {Escherichiacoli} 53.5 71.5 144 HI1043 1104813 1105724 modification methylase HgiDI(MHgiDI) {Herpetosiphon aurantiacus} 56.4 70.5 297 HI0515 529891 528338modification methylase HincII (hincIIM) {Haemophilus influenzae} 98.298.6 502 HI0912 963611 964312 mutator mutT (AT-GC transversion){Escherichia coli} 48.8 72.0 125 HI0193 206098 206688 negative modulatorof initiation of replication (seqA) {Escherichia coli} 53.1 71.8 177HI0548 568202 567879 primosomal protein n precursor (priB) {Escherichiacoli} 57.4 75.2 101 HI0341 367532 365343 primosomal protein replicationfactor (priA) {Escherichia coli} 52.3 70.2 729 HI0389 406402 408321probable ATP-dependent helicase (dinG) {Escherichia coli} 32.2 51.1 680HI0993 1054243 1053119 recF protein (recF) {Escherichia coli} 57.0 75.8356 HI0334 358532 359239 recD protein (recD) {Escherichia coli} 64.676.5 226 HI0602 621957 820896 recombinase (recA) {Haemophilusinfluenzae} 100.0 100.0 354 HI0061 64971 62573 recombination protein(rec2) {Haemophilus influenzae} 99.9 99.9 800 HI0445 464118 464717 recRprotein (recR) {Escherichia coli} 74.9 88.4 199 HI0601 620735 620358regulatory protein (recX) {Pseudomonas fluorescens} 28.6 50.4 117 HI0651694862 692768 rep helicase (rep) {Escherichia coli} 66.9 82.7 669 HI12321299240 1297177 replication protein (dnaX) {Escherichia coli} 52.9 69.8643 HI1580 1641089 1642600 replicative DNA helicase (dnaB) {Escherichiacoli} 68.6 82.8 462 HI1042 1103812 1104813 restriction enzyme (hgiDIR){Herpetosiphon giganteus} 44.2 63.9 350 HI1175 1241423 1242574S-adenosylmethionine synthetase 2 (metX) {Escherichia coli} 82.3 91.7383 HI1429 1512463 1511552 shufflon-specific DNA recombinase (rci){Escherichia coli} 31.1 55.5 259 HI0251 281830 282333 single-strandedDNA binding protein (ssb) {Haemophilus influenzae} 95.8 98.2 168 HI15781639113 1638016 site-specific recombinase (rcb) {Escherichia coli} 36.357.0 265 HI1368 1450325 1452928 topoisomerase I (topA) {Escherichiacoli} 72.0 84.3 865 HI0446 464736 466688 topoisomerase III (topB){Escherichia coli} 65.9 79.4 645 HI1535 1599641 1601881 topoisomerase IVsubunit A (parC) {Escherichia coli} 71.4 85.4 727 HI1534 1597676 1599571topoisomerase IV subunit B (parE) {Escherichia coli} 76.5 88.6 630HI1261 1331575 1335011 transcription-repair coupling factor (trcF) (mfd){Escherichia coli} 64.3 82.7 1134 HI0217 232884 234038 type Irestriction enzyme ecokI specificity protein (hsdS) {Escherichia coli}36.1 58.6 394 HI0216 231281 232797 type I restriction enzyme ECOR124/3 IM protein (hsdM) {Escherichia coli} 81.2 89.3 512 HI1290 1368549 1367223type I restriction enzyme ECDR124/3 I M protein (hsdM) {Escherichiacoli} 30.4 53.7 332 HI1288 1365756 1362592 type I restriction enzymeECOR124/3 R protein (hsdR) {Escherichia coli} 30.4 52.7 991 HI10591123091 1121205 type III restriction-modification ECOP15 enzyme (mod){Escherichia coli} 36.5 55.5 384 HI0018 18087 18743 uracil DNAglycosylase (ung) {Escherichia coli} 70.2 79.5 215 HI0311 342051 342941xprB protein (xerD) {Escherichia coli} 68.9 84.8 296 Degradation of DNAHI1695 1758680 1759312 endonuclease III (nth) {Escherichia coli} 83.491.9 211 HI0250 278528 281829 excinuclease ABC subunit A (uvrA){Escherichia coli} 81.2 91.0 940 HI1250 1323924 1321888 excinuclease ABCsubunit B (uvrB) {Escherichia coli} 78.0 87.7 669 HI0057 58893 57067excinuclease ABC subunit C (uvrC) {Escherichia coli} 65.9 80.0 588HI1380 1471626 1473044 exodeoxyribonuctease I (sbcB) {Escherichia coli}57.5 74.9 462 HI1324 1395898 1399530 exodeoxyriborluclesse V (recB){Escherichia coli} 37.1 58.2 1165 HI0944 998895 1002257exodeoxyribonuctease V (recC) {Escherichia coli} 40.1 61.2 1114 HI13251399533 1401452 exodeoxyribonuclease V (recD) {Escherichia coli} 40.059.3 570 HI0041 43872 43072 exonuclease III (xthA) {Eschenchia coli}71.9 83.9 267 HI0399 417972 419288 exonuclease VII, large subunit (xseA){Escherichia coli} 57.8 74.4 437 HI1217 1280795 1282519single-stranded-DNA-Specific exonuclease (recJ) {Escherichia coli} 59.277.3 554 Transcription RNA synthesis, modification and DNA transcriptionHI0618 647724 650492 ATP-dependent helicase HEPA (hepA) {Escherichiacoli} 53.6 73.6 968 HI0424 444751 443435 ATP-dependent RNA helicase(smB) {Escherichia coli} 39.8 60.9 448 HI0232 260978 262816ATP-dependent RNA helicase DEAD (deaD) {Escherichia coli} 64.0 78.6 613HI0804 851485 852468 DNA-directed RNA polymerase alpha chain (rpoA){Escherichia coli} 91.8 97.0 329 HI0517 534212 538870 DNA-directed RNApolymerase beta chain (rpoB) {Salmonella typhimurium} 83.3 91.9 1342HI0516 534211 529967 DNA-directed RNA polymerase beta chain (rpoC){Escherichia coli} 83.0 90.7 1399 HI1307 1383078 1383509 N utilizationsubstance protein B (nusB) {Escherichia coli} 54.9 71.4 133 HI0063 6591567269 plasmid copy number control protein (pcnB) {Escherichia coli} 55.773.4 404 HI0230 257702 259828 polynuctectide phosphorylase (pnp){Escherichia coli} 74.2 86.7 708 HI0894 944630 945883 putativeATP-dependent RNA helicase (rhIB) {Escherichia coli} 73.9 84.1 410HI1748 1828594 1828331 RNA polymerase omega subunit (rpoZ) {Escherichiacoli} 64.8 76.1 88 HI1463 1542205 1541624 sigma factor (algU){Pseudomonas aeruginosa} 27.6 48.8 168 HI0719 764847 765401transcription antitemination protein (nusG) {Escherichia coli} 73.7 84.4179 HI0571 589932 590405 transcription elongation factor (greB){Escherichia coli} 61.5 79.5 156 HI1286 1358486 1360006 transcriptionfactor (nusA) {Salmonella typhimurium} 70.8 84.1 499 HI0297 328437329696 transcription termination factor rho (rho) {Escherichia coli}87.4 95.2 419 Degradation of RNA HI0219 234848 237923 anticodon nucleasemasking-agent (prD) {Escherichia coli} 72.9 85.6 291 HI1739 18105861808610 exoribonuclease II (RNasell) {Escherichia coli} 50.8 66.0 588HI0392 411354 412550 ribonuclease D (md) {Escherichia coli} 41.3 65.5365 HI0415 433540 436392 ribonuclease E (me) {Escherichia coli} 60.372.3 1058 HI0139 152730 153191 ribonuclease H (mh) {Escherichia coli}64.9 76.0 154 HI1061 124258 1123668 ribonuclease HII (EC 31264) (RNASEHII) {Escherichia coli} 73.7 82.8 185 HI0014 14422 13742 ribonucleaseIII (mc) {Escherichia coli} 65.3 80.2 221 HI0275 306539 305826ribonuclease PH (rph) {Escherichia coli} 78.9 87.8 237 HI1001 10633361063743 RNase P (mpA) {Escherichia coli} 69.7 80.7 119 HI0326 351726352412 RNase T (mt) {Escherichia coli} 65.7 80.9 204 TranslationRibosomal proteins - synthesis, modification HI0518 539557 538871ribosomal protein L1 (rpL1) {Escherichia coli} 85.6 93.4 229 HI0642681369 681857 ribosomal protein L10 (rpL10) {Salmonella typhimurium}80.5 89.0 165 HI0519 539990 539565 ribosomal protein L11 (rpL11){Escherichia coli} 86.6 94.4 142 HI0980 1035484 1036371 ribosomalprotein L11 methyltransferase (prmA) {Escherichia coli} 69.2 83.2 291HI1447 1530773 1530348 ribosomal protein L13 (rpL13) {Haemophilussomnus} 94.4 95.8 142 HI0790 844379 844747 ribosomal protein L14 (rpL14){Escherichia coli} 94.3 98.4 123 HI0799 847996 848427 ribosomal proteinL15 (rpL15) {Escherichia coli} 82.6 91.0 144 HI0786 842244 842651ribosomal protein L16 (rpL16) {Escherichia coli} 89.7 95.6 136 HI0805852512 852895 ribosomal protein L17 (rp10) {Escherichia coli} 89.8 92.1127 HI0796 846938 847288 ribosomal protein L18 (rpL18) {Escherichiacoli} 84.6 91.5 117 HI0202 216787 216440 ribosomal protein L19 (rpL19){Escherichia coli} 89.5 98.2 114 HI0782 840039 840857 ribosomal proteinL2 (rpL2) {Escherichia coli} 85.7 93.4 2.73 HI1323 1395432 1395782ribosomal protein L20 (rpL20) {Escherichia coli} 94.0 96.6 117 HI0882932097 931789 ribosomal protein L21 (rpL21) {Escherichia coli} 79.6 86.4103 HI0784 841173 841502 ribosomal protein L22 (rpL22) {Escherichiacoli} 91.8 97.3 110 HI0781 839722 840018 ribosomal protein L23 (rpL23){Escherichia coli} 71.7 82.8 99 HI0791 844761 845069 ribosomal proteinL24 (rpL24) {Escherichia coli} 76.7 86.4 103 HI1636 1692153 1692437ribosomal protein L25 (rpL25) {Escherichia coli} 61.9 77.4 84 HI0881931428 931788 ribosomal protein L27 (rpL27) {Escherichia coli} 87.1 90.685 HI0953 1010494 1010261 ribosomal protein L28 (rpL28) {Escherichiacoli} 85.7 94.8 77 HI0787 842654 842842 ribosomal protein L29 (rpL29){Escherichia coli} 75.8 87.1 62 HI0779 838481 839104 ribosomal proteinL3 (rpL3) {Escherichia coli} 85.2 92.3 209 HI0798 847813 847989ribosomal protein L30 (rpL30) {Escherichia coli} 79.7 86.4 59 HI0760821826 821617 ribosomal protein L31 (rpL31) {Escherichia coli} 71.4 85.770 HI0159 174441 174274 ribosomal protein L32 (rpL32) {Escherichia coli}77.2 86.0 57 HI0952 1010246 1010079 ribosomal protein L33 (rpL33){Escherichia coli} 81.5 90.7 54 HI1000 1063233 1063364 ribosomal proteinL34 (rpL34) {Escherichia coli} 86.4 93.2 44 HI1322 1395096 1395269ribosomal protein L35 (rpL35) {Escherichia coli} 75.0 90.6 32 HI0780839123 839722 ribosomal protein L4 (rpL4) {Escherichia coli} 83.6 93.0201 HI0792 845090 845626 ribosomal protein L5 (rpL5) {Escherichia coli}90.5 96.1 179 HI0795 846391 846921 ribosomal protein L6 (rpL6){Escherichia coli} 75.1 90.4 177 HI0643 681915 682283 ribosomal proteinL7/L12 (rpL7/L12) {Escherichia coli} 82.0 91.8 121 HI0546 567619 567173ribosomal protein L9 (rpL9) {Escherichia coli} 72.5 85.9 149 HI12231289629 1291274 ribosomal protein S1 (rpS1) {Escherichia coli} 79.3 88.7557 HI0778 838108 838461 ribosomal protein S10 (rpS10) {Escherichiacoli} 98.1 99.0 103 HI0802 850416 850802 ribosomal protein S11 (rpS11){Escherichia coli} 92.2 96.1 129 HI0801 850045 850397 ribosomal proteinS13 (rpS13) {Escherichia coli} 86.4 93.2 118 HI0793 845641 845943ribosomal protein S14 (rpS14) {Escherichia coli} 89.9 94.9 99 HI13311405806 1406072 ribosomal protein S15 (rpS15) {Escherichia coli} 80.986.5 89 HI1473 1554091 1553825 ribosomal protein S15 (rpS16){Escherichia coli} 80.9 86.5 89 HI0205 218422 218177 ribosomal proteinS16 (rpS16) {Escherichia coli} 70.7 85.4 82 HI0788 842845 843099ribosomal protein S17 (rpS17) {Escherichia coli} 70.7 85.4 82 HI0547567863 567639 ribosomal protein S18 (rpS18) {Escherichia coli} 85.7 94.084 HI0783 840886 841158 ribosomal protein S19 (rpS19) {Escherichia coli}92.0 94.7 75 HI0915 967289 968041 ribosomal protein S2 (rpS2){Escherichia coli} 82.2 89.2 241 HI0533 553446 553658 ribosomal proteinS21 (rpS21) {Escherichia coli} 83.1 87.3 71 HI0785 841523 842227ribosomal protein S3 (rpS3) {Escherichia coli} 87.2 93.2 233 HI0803850833 851450 ribosomal protein S4 (rpS4) {Escherichia coli} 89.3 94.7206 HI0797 847306 847803 ribosomal protein S5 (rpS5) {Escherichia coli}92.8 95.8 166 HI0549 568566 568192 ribosomal protein S6 (rpS6){Escherichia coli} 76.8 87.2 125 HI1537 1604087 1603182 ribosomalprotein S6 modification protein (rimK) {Escherichia coli} 45.3 69.0 272HI0582 599803 599336 ribosomol protein S7 (rpS7) {Escherichia coli} 89.794.2 155 HI0794 845983 846372 ribosomal protein S8 (rpS8) {Escherichiacoli} 86.2 90.8 130 HI1446 1530328 1529939 ribosomal protein S9 (rpS9){Haemophilus somnus} 94.6 98.5 130 HI0010 11292 10828ribosomal-protein-alanine acetyltransferase (riml) {Escherichia coli}55.9 73.1 144 HI0583 600334 599963 streptomycin resistance protein(strA) {Haemophilus influenzae} 100.0 100.0 124 Amino acyl tRNASynthetases, tRNA modification HI0816 865547 862926 alanyl-tRNAsynthetase (alaS) {Escherichia coli} 68.2 82.6 873 HI1589 16486851650415 arginyl-IRNA synthetase (argS) {Escherichia coli} 71.2 83.5 577HI1305 1382405 1380975 asparaginyl-tRNA synthetase (asnS) {Escherichiacoli} 80.6 90.8 465 HI0319 348931 347168 aspartyl-tRNA synthetase (aspS){Escherichia coli} 76.2 85.5 585 HI0078 85367 83991 cys-tRNA synthetase(cysS) {Escherichia coli} 75.7 87.0 461 HI0710 753356 754738cysteinyl-tRNA (ser) selenium transferase (selA) {Escherichia coli} 58.875.8 454 HI1357 1431798 1433466 glutaminyl-tRNA synthetase (glnS){Escherichia coli} 75.7 86.9 547 HI0276 308282 306843 glutamyl-tRNAsynthetase (gltX) {Escherichia coli} 72.4 84.3 464 HI0929 985024 984119glycyl-tRNA synthetase alpha chain (glyQ) {Escherichia coli} 90.6 94.6299 HI0926 983065 981002 glycyl-tRNA synthetase beta chain (glyS){Escherichia coli} 69.7 81.9 689 HI0371 392076 393344 histidine-tRNAsynthetase (hisS) {Escherichia coli} 66.8 79.1 421 HI0964 10210721018250 isoleucyl-tRNA ligase (ileS) {Escherichia coli} 66.0 78.5 934HI0923 976547 979129 leucyl-tRNA synthetase (leuS) {Escherichia coli}72.3 82.2 859 HI1214 1278435 1276930 lysyl-tRNA synthetase (lysU){Escherichia coli} 70.2 84.3 505 HI0838 885271 886269 lysyl-tRNAsynthetase analog (genX) {Escherichia coli} 62.7 78.5 331 HI0625 662613663566 methionyl-tRNA formyltransferase (fmt) {Escherichia coli} 65.077.4 313 HI1279 1353301 1351256 methionyl-tRNA synthetase (metG){Escherichia coli} 69.0 83.3 677 HI0396 416278 415697 peptidyl-tRNAhydrolase (pth) {Escherichia coli} 64.2 80.5 190 HI1314 1387690 1388676phenylalanyl-tRNA synthetase beta-subunit (pheS) {Escherichia coli} 75.082.0 327 HI1315 1388713 1391097 phenylalanyl-tRNA synthetasebeta-subunit (pheT) {Escherichia coli} 65.3 80.1 795 HI0731 781970783684 prolyl-tRNA synthetase (proS) {Escherichia coli} 74.9 86.8 570HI1650 1709685 1708879 pseudouridylate synthase I (hisT) {Escherichiacoli} 69.2 82.7 260 HI0246 273589 272501 queuosine biosynthesis protein(queA) {Escherichia coli} 72.5 85.7 346 HI0201 215333 216439 seleniummetabolism protein (selD) {Escherichia coli} 66.1 80.6 330 HI0110 117234118520 seryl-tRNA synthetase (serS) {Escherichia coli} 77.6 86.5 430HI1370 1453876 1455804 threonyl-tRNA synthetase (thrS) {Escherichiacoli} 77.9 86.1 642 HI0245 272154 271009 transfer RNA-guaninetransglycosylase (tgt) {Escherichia coli} 81.3 91.5 374 HI0203 217564216827 tRNA (guanine-N1)-methyltransferase (M1G-methyltransferase) (tmD)83.2 93.0 244 {Escherichia coli} HI0850 894301 895389 tRNA(uracil-5-)-methyltransferase (tmA) {Escherichia coli} 64.6 80.4 362HI0068 71519 72451 tRNA delta(2)-isopentenylpyrophosphate transferase(trpX) {Escherichia 69.8 87.4 300 coli} HI1612 1671420 1672667 tRNAnucleotidyltransferase (cca) {Escherichia coli} 58.4 73.4 404 HI0242270097 269807 tRNA-guanine-transglycosylase (tgt) {Escherichia coli}62.4 81.7 92 HI0639 678958 677957 tryptophanyl-tRNA synthetase (trpS){Escherichia coli} 78.1 86.2 334 HI1616 1676533 1675331 tyrosyl tRNAsynthetase (tyrS) {Thiobacillus ferrooxidans} 53.6 72.6 398 HI13961490259 1487398 valyl-tRNA synthetase (valS) {Escherichia coli} 70.883.3 951 Nucleoproteins HI0187 200140 200544 DNA binding protein(probable) {Bacillus subtilis} 43.4 64.2 106 HI1496 1568461 1568685DNA-binding protein (rdgB) {Erwinia carotovora} 42.4 60.6 67 HI15931655153 1655554 DNA-binding protein H-NS (hna) {Escherichia coli} 47.465.2 135 HI0432 453511 453104 DNA-binding protein HU-ALPHA (NS2) (HU-2){Escherichia coli} 78.9 86.7 90 Proteins - translation and modificationHI0848 893035 893757 disulfide oxidoreductase (per) {Haemophilusinfluenzae} 100.0 100.0 205 HI0987 1042200 1041082 DNA processing chainA (dprA) {Escherichia coli} 44.8 60.2 358 HI0916 968177 969025elongation factor EF-Ts (tsf) {Escherichia coli} 71.4 85.0 260 HI0580597082 595901 elongation factor EF-Tu (duplicate) (tufB) {Escherichiacoli} 92.6 95.9 394 HI0634 671167 672348 elongatien factor EF-Tu(duplicate) (tufB) {Escherichia coli} 92.6 95.9 394 HI0581 599249 597150elongation factor G (fusA) {Escherichia coli} 64.6 92.0 704 HI0330355617 355054 elongation factor P (efp) {Escherichia coli} 75.0 85.6 188HI0069 72460 75402 glutamate-ammonia-ligase adenylyltransferase (glnE){Escherichia coli} 52.5 69.7 914 HI1321 1394551 1394954 initiationfactor 3 (infC) {Escherichia coli} 82.8 94.8 134 HI0550 569019 568768initiation factor IF-1 (infA) {Escherichia coli} 94.4 98.6 72 HI12871360021 1362507 initiation factor IF-2 (infB) {Escherichia coli} 70.984.5 842 HI1155 1218859 1220211 maturation of antibiotic MccB17 (pmbA){Escherichia coli} 60.8 78.7 450 HI1728 1794724 1793921 methionineaminopeptidase (map) {Escherichia coli} 64.3 79.8 262 HI0430 450570451100 oxido-reductase (dsbB) {Escherichia coli} 43.8 68.8 174 HI12151279684 1278589 peptide chain release factor 2 (prfB) {Salmonellatyphimurium} 81.7 93.7 365 HI1741 1811636 1813216 peptide-chain-releasefactor 3 (prfC) {Escherichia coli} 86.0 93.4 527 HI0079 85470 85976peptidyl-prolyl cis-trans isomerase B (ppiB) {Escherichia coli} 71.380.5 163 HI1567 1631427 1630345 polypeptide chain release factor 1(prfA) {Salmonella typhimurium} 72.5 68.3 360 HI0624 662011 662517polypeptide deformylase (formylmethionine deformylase) (def){Escherichia 65.1 79.9 169 coli} HI0810 857270 856716 ribosome releasingfactor (frr) {Escherichia coli} 66.1 64.9 185 HI0575 593158 592940rotamase, peptidyl prolyl cis-trans isomerase (slyD) {Escherichia coli}50.7 73.1 67 HI0701 745982 745413 rotamase, peptidyl prolyl cis-transisomerase (slyD) {Escherichia coli} 68.3 79.4 187 HI1334 1408450 1408923transcription elongation factor (greA) {Escherichia coli} 79.7 89.9 158HI0711 754738 756593 translation factor (selB) {Escherichia coli} 44.064.7 606 HI1216 1279817 1280503 xprA protein (xprA) {Escherichia coli}45.4 67.4 227 Degradation of proteins, peptides, glycopeptides HI0877927500 928801 aminopeptidase A (pepA) {Rickettsia prowazekii} 39.6 57.9313 HI1711 1775967 1777439 aminopeptidase a/i (pepA) {Escherichia coli}57.3 77.5 497 HI1620 1682194 1679588 aminopeptidase N (pepN){Escherichia coli} 60.9 75.6 864 HI0818 867554 866265 aminopeptidase P(pepP) {Escherichia coli} 54.6 73.6 435 HI0716 762461 763039ATP-dependent clp protease proteolytic component (clpP) {Escherichiacoli} 71.0 88.1 193 HI0717 763052 764284 ATP-dependent protease ATPasesubunit (clpX) {Escherichia coli} 70.2 83.2 413 HI0861 906379 908946ATP-dependent protease binding subunit (clpB) {Escherichia coli} 77.488.6 857 HI0421 440910 442289 collagenase activity collagenase (prtC){Porphyromonas gingivalis} 31.1 53.4 206 HI0151 166695 165811 HFLCprotein (hflC) {Escherichia coli} 58.5 78.2 329 HI0248 274175 276400lgA1 protease (iga1) {Haemophilus influenzae} 28.6 51.5 759 HI09921047674 1053118 lgA1 protease (iga1) {Haemophilus influenzae} 99.8 99.91702 HI0249 278527 276401 lgA1 protease (igal) {Haemophilus influenzae}45.2 62.5 791 HI1327 1402067 1403869 Ion protease (Ion) {Bacillusbrevis} 24.2 46.6 714 HI0215 229004 231046 oligopeptidase A (prlD){Escherichia coli} 72.0 84.8 678 Hl0677 716670 718121 peptidase D (pepD){Escherichia coli} 55.8 72.2 485 HI0589 608542 607865 peptidase E (pepE){Escherichia coli} 41.4 60.0 214 HI1351 1423832 1425067 peptidase T(pepT) {Salmonella typhimurium} 53.3 71.4 398 HI1262 1336467 1335070periplasmic serine protease Do and heat shock protein (htrA){Escherichia 55.8 73.9 469 coli} HI1603 1664636 1663212 probableATP-dependent protease (sms) {Escherichia coli} 80.0 92.2 460 HI0724768169 768786 proline dipeptidase (pepQ) {Escherichia coli} 53.7 70.2204 HI0137 151209 151901 protease (prtH) {Porphyromonas gingivalis} 52.664.9 57 HI1547 1613228 1611384 protease IV (sppA) {Escherichia coli}43.7 64.0 607 HI0152 167927 166698 protease specific for phage lambdacII repressor (hflK) {Escherichia coli} 55.8 72.6 396 HI1688 17510311752089 putative protease (sohB) {Escherichia coli} 53.3 74.5 348 HI0532553214 552189 sialoglycoprotease (gcp) {Pasteurella haemolytica} 81.891.5 319 Transport/binding proteins Amino acids, peptides, amines HI11831247387 1246659 arginine transport ATP-binding protein artP (artP){Escherichia coli} 65.8 83.1 242 HI1180 1245250 1244570 argininetransport system permease protein (artM) {Escherichia coli} 55.7 79.9218 HI1181 1245915 1245253 arginine transport system permease protein(artQ) {Escherichia coli} 59.0 77.8 229 HI0254 284235 283786 biopolymertransport protein (exbB) {Haemophilus influenzae} 96.0 98.7 150 HI0253283779 283339 biopolymer transport protein (exbD) {Escherichia coli}28.8 55.1 118 HI1734 1801710 1800520 branched chain aa transport systemII carrier protein (braB) {Pseudomonas 28.4 49.8 279 aeruginosa} HI0885935516 934149 D-alanine permease (dagA) {Alteromonas haloplanktis} 43.265.5 527 HI1188 1251117 1250128 dipeptide transport ATP-binding protein(dppD) {Escherichia coli} 74.2 84.0 326 HI1187 1250122 1249142 dipeptidetransport ATP-binding protein (dppF) {Escherichia coli} 76.4 87.1 325HI1126 1189626 1188709 dipeptide transport system permease protein(dppB) {Escherichia coli} 34.1 60.7 337 HI1190 1253029 1252031 dipeptidetransport system permease protein (dppB) {Escherichia coli} 61.1 79.2337 HI1189 1252013 1251130 dipeptide transport system permease protein(dppC) {Escherichia coli} 63.8 83.3 287 HI1536 1601926 1603137 glutamatepermease (gltS) {Escherichia coli} 53.9 73.0 391 HI1081 146102 1145389gtutamine transport system permease protein (glnP) {Escherichia coli}37.6 59.0 212 HI1082 1146859 1146089 glutamine-binding periplasmicprotein (glnH) {Escherichia coli} 28.4 48.2 222 HI0410 429066 428263leucine-specific transport protein (livG) {Escherichia coli} 28.1 55.2250 HI0227 255068 256375 membrane-associated component, LIV-II transportsystem (bmQ) 32.9 60.4 425 {Salmonella typhimurium} HI0214 228528 226987oligopeptide binding protein (oppA) {Escherichia coli} 31.7 53.5 473HI1127 1191333 1189710 oligopeptide binding protein (oppA) {Escherichiacoli} 52.6 69.0 527 HI1124 1187751 1186783 oligopeptide transportATP-binding protein (oppD) {Salmonella 77.2 85.0 320 typhimurium} HI11231186783 1185788 oligopeptide transport ATP-binding protein (oppF){Salmonella typhimurium} 71.5 83.9 329 HI1125 1188696 1187784oligopeptide transport system permease protein (oppC)C {Salmonella 71.187.4 300 typhimurium} HI1644 1702355 1704049 peptide transportperiplasmic protein (sapA) {Salmonella typhimurium} 39.3 63.8 504 HI16471705898 1706944 peptide transport system ATP-binding protein (sapD){Salmonella 62.4 80.0 330 typhimurium} HI1646 1705007 1705891 dipeptidetransport system permease protein (dppC) {Escherichia coli} 36.2 59.9279 HI1645 1704052 1705014 peptide transport system permease protein(sapB) {Salmonella 34.4 63.8 319 typhimurium} HI1182 1246838 1245922periplasmic arginine-binding protein (artl) {Pasteurella haemolytica}58.6 73.4 234 HI1157 1221270 1222589 proton glutamate symport protein(gltP) {Bacillus caldotenax} 26.6 53.6 395 HI0592 611920 610616putrescine transport protein (potE) {Escherichia coli} 77.2 88.0 434HI0291 324543 323308 serine transporter (sdaC) {Escherichia coli} 61.077.8 411 HI1350 1423563 1422421 spermidine/putrescine transportATP-binding protein (potA) {Escherichia 68.1 83.1 378 HI1349 14224341421577 spermidine/putrescine transport system permease protein (potS)61.5 83.6 275 HI1348 1421548 1420808 spermidine/putrescine transportsystem permease protein (potC) 72.4 88.9 243 {Escherichia coli} HI0500514110 513175 spermidine/putrescine-binding periplasmic proteinprecursor (potD) 59.2 75.2 309 {Escherichia coli} HI1347 1420732 1419596spermidine/putrescine-binding periplasmic protein precursor (potD) 54.171.6 330 {Escherichia coli} HI0289 320539 321792 tryptophan-specificpermease (mtr) {Escherichia coli} 55.8 72.5 396 HI0479 497829 499028tyrosine-specific transport protein (tyrP) {Escherichia coli} 46.1 68.2401 HI0530 551559 550342 tyrosine-specific transport protein (tyrP){Escherichia coli} 45.4 65.4 404 Cations HI0255 284871 284407bacterioferritin comigratory protein (bcp) {Eschenchia. coli} 62.3 79.9154 HI1275 1347862 1348650 ferric enterobactin transport ATP-bindingprotein (fepC) {Escherichia coli} 29.4 51.3 238 HI1475 1555193 1554435ferric enterobactin transport ATP-binding protein (fepC) {Escherichiacoli} 33.2 54.8 220 HI1471 1549654 1551853 ferrichrome-iron receptor(fhuA) {Escherichia coli} 26.4 48.9 710 HI1388 1479930 1480475 ferritinlike protein (rsgA) {Escherichia coli} 57.4 79.0 162 HI1389 14804941480988 ferritin like protein (rsgA) {Escherichia coli} 57.3 73.8 164HI0363 385804 384887 iron(III) dicitrate transport ATP-binding proteinFECE {Escherichia coli} 35.9 56.4 220 HI1274 1347324 1347861 iron(III)dicitrate transport system permease protein (fecD) {Escherichia 36.064.0 255 coli} HI1037 1099321 1100265 magnesium and cobalt transportprotein (corA) {Escherichia coli} 70.3 84.8 316 HI0097 103798 104679major ferric iron binding protein precursor (fbp) {Neisseriagonorrhoeae} 69.7 82.3 293 HI1051 1114308 1114635 mercuric transportprotein (merT) {Pseudomonas aeruginosa} 25.0 55.2 99 HI1052 11146511114926 mercury scavenger protein (merP) {Pseudomonas fluorescens} 29.345.7 91 HI0294 327396 327193 mercury scavenger protein (merP){Pseudomonas fluorescens} 32.8 67.2 67 HI1531 1594953 1594219molybdate-binding periplasmic protein precursor (modB) {Azotobacter 21.743.0 245 vinelandii} HI0226 254880 253681 NA(+)/H(+) antiporter 1 (rhaA){Escherichia coli} 52.6 74.6 380 HI0429 448992 450557 Na+/H+ antiporter(nhaB) {Escherichia coli} 70.6 87.5 501 HI1110 1171933 1170530 Na+/H+antiporter (nhaC) {Bacillus firmus} 37.5 62.0 382 HI0098 104899 106317periplasmic-binding-protein-dependent iron transport protein (stuB) 38.159.5 457 {Serratia marcescens} HI1479 1558763 1558167periplasmic-binding-protein-dependent iron transport protein (sfuC) 39.958.0 197 {Serratia marcescens} HI0913 964424 966276 potassium effluxsystem (kelC) {Escherichia coli} 40.9 65.7 594 HI0292 326934 324769potassium/copper-transportING ATPase A (copA) {Enterococcus faecalis}42.9 64.4 723 HI1355 1429787 1428276 sodium/proline symporter (prolinepermease) (putP) {Escherichia coli} 62.8 79.1 489 HI0252 283326 282517tonB protein (tonB) {Haemophilus influenzae} 96.2 98.5 261 HI0627 664922666362 TRK system potassium uptake protein (trkA) {Escherichia coli}65.8 83.4 458 Carbohydrates, organic alcohols & acids HI0020 22097 206612-oxoglutarate/malate translocator (SODiT1) {Spinacia oleracea} 35.859.6 452 HI0824 872894 873940 D-galactose-binding periplasmic protein(mglB) {Escherichia coli} 67.6 81.2 329 HI1113 1176024 1174516 D-xylosetransport ATP-binding protein (xylG) {Escherichia coli} 71.5 85.8 501HI1114 1177073 1176078 D-xylose-binding penplasmic protein (rbsB){Escherichia coli} 76.0 88.4 328 HI1718 1785024 1783300 enzyme I (ptsl){Salmonella typhimurium} 70.2 84.3 574 HI0182 194818 193967 formatetransporter (formate channel) {Escherichia coli} 53.2 73.4 263 HI0450471781 470285 fructose-permease IIA/FPR component (fruB) {Escherichiacoli} 51.5 68.3 374 HI0448 469337 467670 fructose-permease IIBCcomponent (truA) {Escherichia coli} 57.2 72.2 552 HI0614 643282 642851fucose operon protein (fucu) {Escherichia coli} 66.3 80.0 94 HI0692733673 734484 glpF protein (glpF) {Escherichia coli} 73.6 87.2 258HI1019 1080518 1081194 glpF protein (glpP) {Escherichia coli} 30.6 54.6208 HI1017 1078404 1079867 gluconate permease (gntP) {Bacillus subtilis}29.1 56.4 442 HI1717 1783237 1782740 glucose phosphotransferase enzymeIII-glc (crr) {Escherichia coli} 73.2 83.3 169 HI0688 729474 730914glycerol-3-phosphatase transporter (glpT) {Escherichia coli} 64.5 78.9445 HI0504 517869 519347 high affinity ribose transport protein (rbsA){Escherichia coli} 71.1 85.4 494 HI0505 519363 520331 high affinityribose transport protein (rbsC) {Escherichia coli} 68.0 86.5 303 HI0503517436 517852 high affinity ribose transport protein (rbsD) {Escherichiacoli} 59.0 78.4 139 HI0612 542139 640856 L-fucose permease (fucP){Escherichia coli} 35.6 57.9 413 HI1221 1288578 1286983 L-lactatepermease (lctP) {Escherichia coli} 30.2 53.9 532 HI1735 1802527 1801757lactam utilization protein (lamB) {Emericella nidulans} 41.3 60.3 130HI0825 874009 875526 mglA protein (mglA) {Escherichia coli} 73.9 84.6506 HI0826 875546 876553 mglC protein (mglC) {Escherichia coli} 79.290.2 336 HI0506 520354 521229 periplasmic ribose-binding protein (rbsB){Escherichia coli} 73.9 86.6 291 HI1719 1785361 1785107phosphohistidinoprotein-hexose phosphotransterase (ptsH) {Escherichia77.6 88.2 85 coli} HI0830 878480 878773 potassium channel homolog (kch){Escherichia coli} 67.7 80.2 96 HI0154 170140 168807 putative aspartatetransport protein (dcuA) {Escherichia coli} 46.4 69.9 436 HI0748 803856805175 putative aspartate transport protein (dcuA) {Escherichia coli}42.6 70.1 435 HI1112 1174509 1173385 ribose transport permease protein(xylH) {Escherichia coli} 69.8 84.1 371 HI1696 1759373 1760743 sodium-and chloride-dependent GABA transporter {Homo sapiens} 29.3 52.6 471HI0738 790926 789403 sodium-dependent noradrenaline transporter {Homosapiens} 31.1 54.2 523 Nucleosides, purines & pyrimidines HI1089 11518151151024 ribonucleotide transport ATP-binding protein (mkl){Mycobacterrum leprae} 42.2 61.5 244 HI1230 1296319 1295078 uracilpermease (ursA) {Escherichia coli} 37.2 61.6 400 Anions HI1104 11642131165028 cysteine synthetase (cysZ) {Escherichia coli} 53.7 76.3 190HI1697 1761825 1760773 hydrophilic membrane-bound protein (modC){Escherichia coli} 55.9 74.5 263 HI1698 1762501 1761815 hydrophobicmembrane-bound protein (modB) {Escherichia coli} 65.9 84.8 223 HI13841477430 1476585 integral membrane protein (patA) {Escherichia coli} 59.677.6 272 HI0356 380045 380764 nitrate transporter ATPase component(nasD) {Klebsiella pneumoniae} 34.9 57.8 254 HI1383 1475710 1475584peripheral membrane protein B (pstB) {Escherichia coli} 77.0 86.8 256HI1385 1478379 1477435 peripheral membrane protein C (pstC) {Escherichiacoli} 57.3 78.7 300 HI1386 1479246 1478473 periplasmic phosphate-bindingprotein (pstS) {Escherichia coli} 49.8 67.7 256 HI1387 1479247 1479929periplasmic phosphate-binding protein (pstS) {Escherichia coli} 63.875.4 69 HI1610 1669474 1670733 phosphate permease (YBR296C){Saccharomyces cerevisiae} 35.6 60.0 551 Other HI0060 62564 60804 ATPdependent translocator homolog (msbA) {Haemophilus influenzae} 100.0100.0 458 HI0623 653683 662010 ATP-binding protein (abc) {Escherichiacoli} 74.0 86.5 200 HI1625 1686470 1686186 cystic fibrosis transmembraneconductance regulator {Bos taurus} 35.3 60.8 233 HI0855 899042 900688heme-binding lipoprotein (dppA) {Haemophilus influenzae} 98.9 99.3 547HI0266 295639 298353 heme-hemopexin-binding protein (hxuA) {Haemophilusinfluenzae} 82.1 89.5 928 HI1476 1556199 1555189 hemin permease (hemu){Yersinia enterocolitica} 36.1 62.7 325 HI0264 291684 293852 heminreceptor precursor (hemR) {Yersinia enterocolitica} 28.5 45.9 678 HI17121779487 1777481 high-affinity choline transport protein (betT){Escherichia coli} 34.7 61.6 653 HI0663 705327 703054 lactoferin bindingprotein (lbpA) {Neisseria meningitidis} 30.2 47.9 763 HI0610 637954639336 Na+/sulfate cotransporter {Rattus norvegicus} 34.4 57.8 562HI0977 1032420 1033871 pantothenate permease (panF) {Escherichia coli}60.2 77.9 478 HI0714 760739 757488 transferrin binding protein 1precursor (tbp1) {Neisseria meningitidis} 29.9 48.6 894 HI0996 10596041056869 transferrin binding protein 1 precursor (tbp1) {Neisseriameningitidis} 51.2 69.5 885 HI1220 1286725 1283987 transferrin bindingprotein 1 precursor (tbp1) {Neisseria meningitidis} 28.4 46.8 902 HI09971061509 1059635 transferrin binding protein 2 precursor (tbp2){Neisseria meningitidis} 39.9 54.7 692 HI0975 1029676 1030542transferrin-binding protein (tfbA) {Actinobacillus pleuropneumoniae}28.9 46.0 578 HI1571 1633105 1633993 transferrin-binding protein 1(tbp1) {Neisseria meningitidis} 41.3 59.5 727 HI0637 676956 674098transferrin-binding protein 1 (tbp2) {Neisseria gonorrhoeae} 31.6 51.7828 HI0665 706622 708309 transport ATP-binding protein (cydD){Escherichia coli} 26.4 54.0 561 HI1160 1226897 1225140 transportATP-binding protein (cydD) {Eschenchia coli} 50.7 73.5 588 Cellularprocesses Chaperones HI0544 565037 565324 chaperonin (groES) (mopB){Escherichia coli} 87.5 94.8 96 HI0545 565350 566993 heat shock protein(groEL) (mopA) {Haemophilus ducreyi} 89.8 94.9 547 HI1241 13104971311678 heat shock protein (dnaJ) {Escherichia coli} 68.0 82.5 376HI0104 111572 109680 heat shock protein C62.5 (htpG) {Escherichia coli}75.4 88.3 621 HI0375 396463 394607 hsc66 protein (hsc66) {Escherichiacoli} 69.2 82.0 616 HI1240 1308539 1310443 hsp70 protein (dnaK){Escherichia coli} 78.5 88.2 638 Cell division HI0771 831200 831853 celldivision ATP-binding protein (ftsE) {Escherichia coli} 64.1 78.3 216HI1121 11275245 1274358 cell division inhibitor (sulA) {Vibrio cholerae}33.9 55.7 116 HI1145 1210058 1211332 cell division protein (ftsA){Escherichia coli} 52.8 74.2 420 HI1338 1410017 1412129 cell divisionprotein (ftsH) {Escherichia coli} 75.2 87.8 624 HI1470 1549516 1546374cell division protein (ftsH) {Escherichia coli} 77.8 88.3 369 HI13371409390 1410016 cell division protein (ftsJ) {Escherichia coli} 81.790.4 208 HI1134 1196901 1197221 cell division protein (ftsL){Escherichia coli} 36.6 60.4 101 HI1144 1209275 1210036 cell divisionprotein (ftsQ) {Escherichia coli} 40.6 58.5 231 HI1140 1204467 1205648cell division protein (ftsW) {Escherichia coli} 52.3 74.9 374 HI0770829937 831178 cell division protein (ftsY) {Escherichia coli} 66.0 81.1497 HI1146 1211419 1212681 cell division protein (ftsZ) {Escherichiacoli} 67.2 83.1 306 HI1377 1465224 1469760 cell division protein (mukB){Escherichia coli} 61.4 77.3 1455 HI1356 1429903 1431375 cytoplasmicaxial filament protein (cafA) {Escherichia coli} 71.0 86.3 488 HI0772831866 832795 ftsX protein (ftsX) {Escherichia coli} 43.5 69.9 292HI1067 1128511 1129221 mukB suppressor protein (smbA) {Escherichia coli}77.4 90.2 235 HI1135 1197237 1199067 penicillin-binding protein 3 (ftsl){Escherichia coli} 52.8 70.7 564 Protein, peptide secretion HI0016 1727815485 GTP-binding membrane protein (lepA) {Escherichia coli} 85.6 91.0597 HI1472 1551915 1553681 colicin V secretion ATP-binding protein(cvaB) {Escherichia coli} 29.9 56.0 373 HI1008 1070885 1071397lipoprotein signal peptidase (lspA) {Escherichia coli} 51.3 71.5 158HI1648 1706947 1707753 peptide transport system ATP-binding protein SAPF(sapP) {Escherichia coli} 49.6 70.8 264 HI0718 764525 764842 preproteintranslocase (secE) {Escherichia coli} 40.6 62.3 106 HI0800 848438 849760preprotein translocase SECY subunit (secY) {Escherichia coli} 74.7 86.9443 HI0241 269734 267887 protein-export membrane protein (secD){Escherichia coli} 59.6 77.3 615 HI0240 267876 266902 protein-exportmembrane protein (secF) {Escherichia coli} 48.0 73.0 302 HI0447 466800467135 protein-export membrane protein (secG) {Escherichia coli} 58.981.3 110 HI0745 801965 801459 protein-export protein (secB) {Escherichiacoli} 56.2 80.8 145 HI0911 961135 963837 secA protein (secA){Escherichia coli} 68.0 81.7 896 HI0015 15473 14427 signal peptidase I(lepB) {Escherichia coli} 46.3 65.1 319 HI0106 114073 112688 signalrecognition particle protein (54 homolog) (ffh) {Escherichia coli} 79.990.9 452 HI0715 761040 762335 trigger factor (tig) {Escherichia coli}64.4 80.3 432 HI0298 330445 329756 type 4 prepilin-like protein specificleader peptidase (hopt)) {Escherichia 27.2 49.0 208 coli} HI0299 331661330445 xcpS protein (xcpS) {Pseudomonas putida} 29.2 56.7 396Detoxification HI0930 985290 986813 KW20 catalasa (hktE) {Haemophilusinfluenzae} 99.2 99.4 508 HI1090 1152892 1152248 superoxide dismutase(sodA) {Haemophilus influenzae} 99.0 99.5 209 HI1004 1065726 1067108thiophene and furan oxidation protein (thdF) {Escherichia coli} 73.885.4 451 Cell killing HI0303 334801 335697 hemolysin (tlyC) {Serpulinahyodysenteriae} 36.9 57.5 252 HI1664 1723070 1723648 hemolysin, 21 kDa(hly) {Actinobacillus pleuropneumoniae} 54.5 72.4 156 HI1376 14644931465221 killing protein (kicA) {Escherichia coli} 69.0 83.6 222 HI13751463019 1464443 killing protein suppressor (kicB) {Escherichia coli}66.9 83.0 440 HI1053 1116898 1115057 leukotoxin secretion ATP-bindingprotein (lktB) {Actinobacillus 34.2 55.1 512 actinomycetemcomitans}Transformation HI0436 456360 455674 com101A protein (comF) {Haemophilusinfluenzae} 100.0 100.0 229 HI1010 1072519 1072854 competence locus E(comE1) {Bacillus subtilis} 46.7 70.0 59 HI0603 622277 622927 tfoXprotein (tfoX) {Haemophilus influenzae} 99.5 99.5 217 HI0443 462729463571 transformation gene cluster hypothetical protein (GB:M62809_1)(com) 100.0 100.0 281 {Haemophilus influenzae} HI0435 455595 455002transformation gene cluster hypothetical protein (GB:M62809_10) (com)99.5 99.5 198 {Haemophilus influenzae} HI0442 460047 462638transformation gene cluster hypothetical protein (GB:M62809_2) (com)100.0 100.0 864 {Haemophilus influenzae} HI0441 459948 459154transformation gene cluster hypothetical protein (GB:M62809_3) (com)100.0 100.0 265 {Haemophilus influenzae} HI0440 459150 458647transformation gene cluster hypothetical protein (GB:M62809_4) (com)100.0 100.0 168 {Haemophilus influenzae} HI0439 458647 458129transformation gene cluster hypothetical protein (GB:M62809_5) (com)100.0 100.0 173 {Haemophilus influenzae} HI0438 458129 457719transformation gene cluster hypothetical protein (GB:M62809_6) (com)100.0 100.0 137 {Haemophilus influenzae} HI0437 457706 456385transformation gene cluster hypothetical protein (GB:M62809_7) (com)99.8 99.8 441 {Haemophilus influenzae} Other categories Colicin-relatedfunctions HI0384 403297 402017 colicin tolerance protein (tolB){Escherichia coli} 63.9 78.1 409 HI1209 1272281 1272769 colicin Vproduction protein (pur regulon) (cvpA) {Escherichia coli} 64.7 79.5 156HI0387 405650 404967 inner membrane protein (tolQ) {Escherichia coli}68.8 83.3 221 HI0386 404892 404476 inner membrane protein (tolR){Escherichia coli} 61.8 78.7 136 HI0385 404457 403342 outer membraneintegrity protein (tolA) {Escherichia coli} 42.6 57.1 406 HI1691 17536231756079 outer membrane integrity protein (tolA) {Escherichia coli} 28.947.7 345 Phage-related functions and prophages HI1493 1566955 1567509E16 protein (muE16) (Bacteriophage mu) 28.5 52.8 143 HI1508 15764851576922 G protein (muG) (Bacteriophage mu) 38.3 52.5 147 HI1574 16365941636181 G protein (muG) (Bacteriophage mu) 33.3 54.0 138 HI1488 15646851565191 gam protein (Bacteriophage mu) 57.1 73.8 168 HI0071 78159 78860heat shock protein 8253 (grpE) {Escherichia coli} 45.9 66.5 193 HI0413432108 431836 host factor-1 (HF-1) (hfq) {Escherichia coli} 90.5 97.3 74HI1509 1577156 1578220 I protein (mul) (Bacteriophage mu) 50.0 55.4 58HI1485 1563429 1564289 MuB protein (muB) (Bacteriophage mu) 46.4 70.4277 HI1521 1584995 1586365 N protein (muN) (Bacteriophage mu) 31.5 52.1452 HI1522 1586368 1587105 P protein (Bacteriophage mu) 39.5 67.3 220HI1416 1505940 1505428 terminase subunit 1 (Bacteriophage SF6) 32.3 52.3128 HI1483 1560600 1562660 transposase A (muA) (Bacteriophage mu) 40.660.1 596 Transposon-related functions HI1106 1166078 1166803 insertionsequence IS1016(V-4) hypothetical protein (GB:X58176_2) 43.6 66.7 39{Haemophilus influenzae} HI1020 1081916 1081346 IS1016-V6 protein(IS1016-V6) {Haemophilus influenzae} 91.7 93.8 191 HI1332 14067951406150 IS1016-V6 protein (IS1016-V6) {Haemophilus influenzae} 54.7 74.7170 HI1583 1645515 1645991 IS1016-V6 protein (IS1016-V6) {Haemophilusinfluenzae} 45.4 61.2 153 Drug/analog sensitivity HI0897 947919 951014acriflavine resistance protein (acrB) {Escherichia coli} 32.7 55.0 1027HI0302 333614 334165 ampD signalling protein (ampD) {Escherichia coli}56.1 75.1 172 HI1245 1315822 1314629 bicyclomycin resistance protein(bcr) {Escherichia coli} 42.6 68.7 383 HI1629 1688581 1689111 mercuryresistance regulatory protein (merR2) {Thiobacillus ferrooxidans} 37.757.5 105 HI0650 692523 691900 modulator of drug activity (mda66){Escherichia coli} 58.1 75.4 191 HI0899 953570 952041 multidrugresistance protein (emrB) {Escherichia coli} 67.7 84.8 499 HI0900 954752953583 multidrug resistance protein (ermA) {Escherichia coli} 46.5 66.3389 HI0036 37441 39472 multidrug resistance protein (mdl) {Escherichiacoli} 29.0 51.2 1094 HI1467 1543471 1544832 nodulation protein T (nodT){Rhizobium leguminosarum} 20.1 46.3 390 HI0551 569189 570049 rRNA(adenosine-N6,N6-)-dimethyltransferase (ksgA) {Escherichia coli} 69.361.5 269 HI0513 527345 526362 tellurite resistance protein (tehA){Escherichia coli} 38.9 62.0 317 HI1278 1351140 1350283 telluriteresistance protein (tehB) {Escherichia coli} 55.2 70.6 194 Radiationsensitivity HI0954 1011412 1010711 radC protein (radC) {Escherichiacoli} 49.8 71.7 219 Adaptations, atypical conditions HI1532 15965701595143 autotrophic growth protein (aut) {Alcaligenes eutrophus} 45.060.9 154 HI0722 766921 767769 heat shock protein (htpX) {Escherichiacoli} 66.3 82.1 288 HI1533 1596655 1597599 heat shock protein B (ibpB){Escherichia coli} 55.9 71.2 304 HI0947 1003887 1004906 htrA-likeprotein (htrH) {Escherichia coli} 55.2 72.6 262 HI0903 956705 957292invasion protein (invA) {Bartonella bacilliformis} 39.5 60.5 167 HI15501615090 1614485 NAD(P)H:menadione oxidoreductase {Mus musculus} 35.954.9 200 HI0460 479443 478505 survival protein (surA) {Escherichia coli}33.0 58.5 424 HI0817 866160 865738 uspA protein (uspA) {Escherichiacoli} 68.6 87.1 140 HI0323 350541 350774 virulence plasmid protein(vagC) {Salmonella dublin} 35.9 57.8 62 HI1254 1326770 1327090 virulenceassociated protein A (vapA) {Dichelobacter nodosus} 40.8 57.7 71 HI0324350774 351175 virulence associated protein C (vapC) {Dichelobacternodosus} 35.4 56.9 128 HI0949 1007984 1007589 virulence associatedprotein C (vapC) {Dichelobacter nodosus} 40.9 60.6 131 HI0452 472751472479 virulence associated protein D (vapD) {{Dichelobacter nodosus}40.7 67.0 91 HI1310 1385051 1385680 virulence plasmid protein (mlgA){Shewanella colwelliana} 23.8 56.3 124 Undetermined HI1164 12303211229908 15 kDa protein (P15) {Escherichia coli} 49.3 68.4 136 HI008569585 88593 2-hydroxyaciddehydrogenases homolog (ddh) {Zymomonasmobilis} 51.5 72.8 324 HI0462 480185 480973 beta-lactamase regulatoryhomolog (mazG) {Escherichia coli} 48.3 72.6 257 HI1676 1738223 1737753conjugative transfer co-repressor (finO) {Escherichia coli} 32.5 51.9 76HI0309 340039 340851 delta-1-pyrroline-5-carboxylate reductase (proC){Pseudomonas aeruginosa} 44.0 60.1 267 HI1555 1620490 1619810 devAprotein (devA) {Anabsena sp.} 42.7 66.4 219 HI0558 576002 575514 devBprotein (devB) {Anabsena sp.} 32.7 51.5 166 HI1342 1415087 1415473embryonic abundant protein, group 3 {Triticum aestivum} 33.3 50.0 102HI0939 996457 995658 extragenic suppressor (suhB) {Escherichia coli}64.7 80.2 258 HI0370 390960 392063 GCPE protein (protein E) (gpcE){Escherichia coli} 88.2 93.9 362 HI0095 102616 101864 GerC2 protein(gerG2) {Bacillus subtilis} 32.9 55.2 191 HI0669 712892 711894 glpXprotein (glpX) {Escherichia coli} 69.2 83.4 325 HI1015 1076616 1077389glyoxylate-induced protein {Escherichia coli} 39.1 57.8 258 HI0499511702 513099 hsIU protein (hsIU) {Escherichia coli} 60.4 90.1 443HI0498 511230 511754 hsIV protein (hsIV) {Escherichia coli} 79.8 89.0172 HI1120 1184041 1182516 ilv-related protein {Escherichia coli} 59.777.0 504 HI0287 319073 317784 isochorismate synthase (entC) {Bacillussubtilis} 31.5 48.9 311 HI1624 1686217 1685567 membrane associatedAtPase (cbiO) {Propionibacterium freudenreichii} 33.7 52.7 184 HI0463481901 481029 membrane protein (lapB) {Pasteurella haemolytica} 34.256.0 221 HI1122 1184867 1185742 membrane protein (lapB) {Pasteurellahaemolytica} 63.1 80.2 216 HI0590 608642 609874 N-carbamyl-L-amino acidamidohydrolase {Bacillus stearothermophilus} 35.9 59.2 406 HI0380 399796398579 nitrogen fixation protein (nifS) {Anabaena sp.} 48.2 67.0 379HI1298 1375045 1373735 nitrogen fixation protein (nifS) {Mycobacteriumleprae} 33.4 56.2 402 HI1346 1418236 1417523 nitrogen fixation protein(nifS) {Mycobacterium leprae} 38.8 58.5 186 HI0379 398591 398139nitrogen fixation protein (nifU) {Klebsiella pneumoniae} 50.8 74.2 122HI0167 180354 181586 nitrogen fixation protein (mfE) {Rhodobactercapsulatus} 30.1 47.9 292 HI1692 1756087 1757160 nitrogen fixationprotein (mfE) {Rhodobacter capsulatus} 32.7 59.5 290 HI0129 143015144800 nitrogenase C (nifC) {Clostridium pasteurianum} 27.1 52.6 248HI1480 1559124 1558768 nitrogenase C (nifC) {Clostridium pasteurianum}40.9 60.2 92 HI0359 381523 382464 nmt1 protein (nmt1) {Aspergillusparasiticus} 25.6 54.7 289 HI1299 1375415 1374882 partitioning systemprotein (parB) (Plasmid RP4) 43.6 67.7 141 HI0224 252941 252168 rarDprotein (rarD) {Escherichia coli} 26.5 53.0 230 HI0682 721733 720840rarD protein (rarD) {Escherichia coli} 27.1 55.0 289 HI0918 970839970249 skp protein (skp) {Pasteurella multocida} 55.5 76.4 191 HI09831038375 1037893 small protein (smpB) {Escherichia coli} 78.8 91.3 160HI1598 1661468 1659882 spoIIIE protein (spoIIIE) {Coxiella burnetii}56.1 74.5 504 HI0898 951407 952018 suppressor protein (msgA){Escherichia coli} 30.2 56.1 254 HI1080 1145382 1144612 surfactin (sfpo){Bacillus subtilis} 58.2 77.9 246 HI0753 811790 811296 toxR regulon(tagD) {Vibrio cholerse} 45.7 64.0 164 HI1412 1502860 1501311 traNprotein (traN) (Plasmid RP4) 40.2 61.5 23.3 HI0666 708305 709960transport ATP-binding protein (cydC) {Escherichia coli} 26.3 51.7 536HI1159 1225137 1223410 transport ATP-binding protein (cydC) {Escherichiacoli} 48.5 70.1 568 HI1562 1627239 1626295 vanH protein (vanH){Transposon Tn1546} 39.7 57.1 251 HI0632 668489 669433 mucoid statuslocus protein (mucB) {Pseudomonas aeruginosa} 25.4 51.8 309 HI0172183553 184785 phenolhydroxylase (ORF6) {Acinetobacter calcoaceticus}33.0 56.9 313 HI1390 1481177 1481266 plasma protease C1 inhibitor {Homosapiens} 75.0 79.2 23

[0271] TABLE 1(b) Previously known gene products of the H. influenzae Rdgenome. HI0060 ATP dependent translocator homolog msbA HI0140 outermembrane protein P2 (ompP2) HI0251 single-stranded DNA binding protein(ssb) HI0252 tonB protein (tonB) HI0266 heme-hemopexin-binding protein(hxuA) HI0351 adenylate kinase (ATP-AMP transphosphorylase) (adk) HI0352hypothetical protein (SP:P24326) HI0353 udp-glucose 4-epimerase(galactowaldenase) (galE) HI0354 hypothetical protein SP:P24324) HI0383PC protein (15kd peptidoglycan-associated outer membrane lipoprotein)(pal) HI0403 outer membrane protein P1 (ompP1) HI0435 transformationgene cluster hypothetical protein (GB:M62809_10) (com) HI0436 com101Aprotein (comF) HI0437 transformation gene cluster hypothetical protein(GB:M62809_7) (com) HI0438 transformation gene cluster hypotheticalprotein (GB:M62809_6) (com) HI0439 transformation gene clusterhypothetical protein (GB:M62809_5) (com) HI0440 transformation genecluster hypothetical protein (GB:M62809_4) ((com) HI0441 transformationgene cluster hypothetical protein (GB:M62809_3) (com) HI0442transformation gene cluster hypothetical protein (GB:M62809_2) (com)HI0443 transformation gene cluster hypothetical protein (GB:M62809_1)(com) HI0514 HincII endonuclease (HincII) HI0515 modification methylaseHincII (hincIIM) HI0552 lipooligosaccharide biosynthesis protein HI0583streptomycin resistance protein (strA) HI0602 recombinase (recA) HI0603tfoX protein (tfoX) HI0606 adenylate cyclase (cyaA) HI0622 28 kDamembrane protein (hlpA) HI0691 protein D (hpd) HI0695 lipoprotein (hel)HI0820 aldose 1-epimerase precursor (mutarotase) (mro) HI0821galactokinase (galk) HI0822 galactose-1-phosphate uridylyltransferase(galT) HI0823 galactose operon repressor (galS) HI0847 hypotheticalprotein (GB:M94205_1) HI0848 disulfide oxidoreductase (por) HI0855heme-binding lipoprotein (dppA) HI0919 protective surface antigen D15HI0930 KW20 catalase (hktE) HI0959 cyclic AMP receptor protein (crp)HI1090 superoxide dismutase (sodA) HI1167 outer membrane protein P5(ompA) HI1191 DNA helicase II (uvrD) HI1397 HindIII modificationmethyltransferase (hindIIIM) HI1398 HindIII restriction endonuclease(hindIIIR) HI1402 DNA polymerase III, chi subunit (holC) HI1545 lic-1operon protein (licC) HI1546 lic-1 operon protein (licD) HI1585 15 kdpeptidoglycan-associated lipoprotein (lpp) HI1594 formyltetrahydrofolatehydrolase (purU) HI1595 enolpyruvylshikimatephosphatesynthase (aroA)HI1699 lsg locus hypothetical protein (GB:M94855_8) HI1700 lsg locushypothetical protein (GB:M94855_7) HI1701 lsg locus hypothetical protein(GB:M94355_6) HI1702 lsg locus hypothetical protein (GB:M94855_5) HI1703lsg locus hypothetical protein (GB:M94855_4) HI1704 lsg locushypothetical protein (GB:M94855_3) HI1705 lsg locus hypothetical protein(GB:M94355_2) HI1706 lsg locus hypothetical protein (GB:M94355_1)

[0272] TABLE 2 Unidentified ORFs of the H. influenzae Rd genome. HI00033249 2464 HI0004 3729 3268 HI0012 11778 12767 HI0017 17829 17449 HI001920239 18819 HI0021 23349 22102 HI0028 29582 29307 HI0033 35298 34834HI0034 35660 35355 HI0035 37440 35788 HI0040 43059 42286 HI0042 4459443923 HI0043 45658 44597 HI0044 46380 45721 HI0045 47261 46710 HI004647328 47687 HI0050 51426 50224 HI0051 51998 51504 HI0052 53023 52040HI0053 54078 53053 HI0056 56966 56256 HI0059 60728 59733 HI0065 6783968312 HI0072 78167 77313 HI0073 79220 78879 HI0074 79653 79216 HI007783046 83909 HI0080 85983 86411 HI0081 86556 87341 HI0082 87601 87864HI0083 87882 88094 HI0090 96604 97314 HI0091 98493 97360 HI0092 9976198505 HI0093 100989 99686 HI0094 101511 101194 HI0096 102950 103522HI0100 107807 107415 HI0101 108091 107654 HI0103 109598 109257 HI0105111789 112625 HI0107 114405 115612 HI0108 115744 116634 HI0109 117067116729 HI0112 119485 119847 HI0114 122424 122311 HI0115 128606 130242HI0116 130860 130246 HI0117 131552 131800 HI0120 134883 134380 HI0121136357 134999 HI0125 140096 141409 HI0126 142556 141573 HI0127 142955143011 HI0128 142718 142584 HI0130 145160 144804 HI0131 145840 145136HI0134 147247 148419 HI0135 148422 149609 HI0135 151208 149695 HI0144159021 158125 HI0146 160156 159932 HI0147 160966 161952 HI0148 161966163864 HI0149 164031 165167 HI0150 165574 165762 HI0153 168744 168040HI0160 174988 174467 HI0163 178311 177715 HI0165 179007 180080 HI0166180130 180348 HI0168 181562 182313 HI0169 182316 182567 HI0170 182570182938 HI0171 182945 183537 HI0173 184932 185969 HI0174 185975 186232HI0175 186247 187500 HI0176 188281 187550 HI0177 189257 188286 HI0178189365 190150 HI0179 190715 190236 HI0183 195295 196233 HI0184 196413197855 HI0185 198872 198048 HI0188 200705 201555 HI0189 201568 202335HI0196 208646 208611 HI0199 213460 214224 HI0204 218138 217605 HI0206218715 219485 HI0211 225095 225199 HI0218 234170 234697 HI0220 238722238084 HI0228 256953 256489 HI0229 257403 257032 HI0231 259913 260854HI0233 262997 264382 HI0234 264390 264539 H02352 64822 264679 HI0236265239 265033 HI0238 265736 266389 HI0239 266350 266781 HI0243 270426270208 HI0244 270941 270426 HI0247 274159 273716 HI0257 285979 286623HI0258 286796 286879 HI0259 286880 288054 HI0260 288240 288058 HI0261288839 288180 HI0262 289503 288919 HI0267 298808 298450 HI0268 298891299487 HI0272 304213 303284 HI0273 305079 304216 HI0277 309032 310684HI0278 311516 310710 HI0279 311998 311516 HI0280 312417 312004 HI0281312664 312371 HI0283 315199 313886 HI0284 315200 316061 HI0286 318836319252 HI0293 327115 326912 HI0295 327473 327856 HI0301 333498 333052HI0305 337302 338036 HI0306 338036 338593 HI0307 338596 339012 HI0308339973 339068 HI0310 340854 342017 HI0312 343117 343401 HI0313 343271343092 HI0317 346507 345770 HI0318 347143 346670 HI0320 349150 349665HI0321 349721 350002 HI0322 349998 350444 HI0325 351245 351649 HI0327352729 354078 HI0328 354114 354374 HI0329 354653 354697 HI0331 355655356668 HI0335 359242 360555 HI0338 363320 363910 HI0340 364253 365296HI0342 367615 368352 HI0343 368440 368781 HI0344 368990 369516 HI0345369512 369790 HI0346 369815 372311 HI0347 372369 373205 HI0348 373208374068 HI0349 374068 374517 HI0352 377303 376029 HI0354 379329 378637HI0355 379330 380044 HI0357 380765 381167 HI0358 381227 381171 HI0361384039 383227 HI0365 386932 387009 HI0366 387928 387053 HI0367 388154389323 HI0368 389428 389964 HI0369 390039 390947 HI0372 393364 393975HI0373 394223 394032 HI0376 397168 396485 HI0377 397743 397222 HI0378398079 397759 HI0381 400309 399860 HI0382 401087 400365 HI0388 406077405670 HI0390 408337 409044 HI0391 409072 409620 HI0393 413144 412599HI0394 414371 413637 HI0395 415645 414557 HI0397 416445 416750 HI0398416756 417967 HI0400 419468 420118 HI0402 421340 421056 HI0406 425499424210 HI0407 426365 425502 HI0414 433167 432202 HI0417 437163 437957HI0418 437953 438759 HI0419 438773 439450 HI0420 439398 440738 HI0422442434 2730 HI0423 443077 442916 HI0425 444797 445516 HI0426 446607445555 HI0433 454103 453516 HI0434 454932 454142 HI0444 463691 464053HI0451 472389 471856 HI0453 472951 472763 HI0454 474321 473026 HI0455474896 474375 HI0456 475705 474926 HI0458 477453 476743 HI0466 485905486561 HI0468 488712 487873 HI0469 489585 488725 HI0471 491037 492317HI0478 497647 497796 HI0489 507333 506959 HI0490 507449 508048 HI0491508051 508521 HI0492 508274 508038 HI0493 508854 509354 HI0494 509815509856 HI0495 509856 510253 HI0496 510797 510306 HI0497 511011 510814HI0502 516228 517265 HI0509 523382 523930 HI0510 524561 524076 HI0511525540 524616 HI0512 525587 526303 HI0521 542216 540966 HI0522 543103542318 HI0523 544656 543115 HI0524 544869 545522 HI0525 546551 545484HI0528 549859 549044 HI0554 571956 572576 HI0556 575147 574608 HI0557575547 575211 HI0559 576210 576091 HI0562 578540 580381 HI0563 581038580382 HI0564 581352 581744 HI0567 584110 583439 HI0570 587757 587551HI0572 591096 590482 HI0574 592124 592846 HI0576 593256 593978 HI0577594070 594732 HI0578 594735 595112 HI0579 595480 595764 HI0587 607340606504 HI0588 607795 607361 HI0591 610092 610508 HI0594 614632 614441HI0595 616566 616775 HI0596 616702 615176 HI0599 619155 619970 HI0600620322 619999 HI0619 650498 651154 HI0626 663569 664921 HI0628 666387666770 HI0629 666663 667117 HI0635 672600 672893 HI0636 672699 673879HI0638 677932 677645 HI0640 679087 679701 HI0649 691619 690906 HI0652694996 694787 HI0655 696806 697567 HI0658 699494 698946 HI0660 701972700059 Hl0661 702429 702136 Hl0662 702781 702425 HI0664 706058 705867HI0667 711078 710050 HI0668 711395 711078 HI0670 713054 713269 HI0672713806 714236 HI0673 715017 714544 HI0674 715691 714544 HI0675 715969715694 HI0679 719498 719061 HI0689 731017 731928 HI0690 732026 732334HI0696 737789 738508 HI0698 743511 739619 HI0699 744964 743524 HI0700745259 744239 HI0702 746523 746065 HI0703 746632 747648 HI0704 747649748418 HI0706 749006 749188 HI0708 749180 749148 HI0720 765555 766304HI0721 766361 766750 HI0723 768095 767817 HI0725 768792 770060 HI0726776311 776868 HI0727 776875 777312 HI0732 786122 783778 HI0733 786625786245 HI0734 786731 786582 HI0735 787647 786715 HI0737 788457 789167HI0742 799454 800908 HI0743 801060 801386 HI0744 801027 800965 HI0746802425 801982 HI0755 816503 817648 HI0757 819456 818531 HI0758 820676819447 HI0762 823117 823386 HI0763 823404 824474 HI0764 825768 825091HI0768 829290 828811 HI0769 829882 829304 HI0774 835432 834092 HI0775836100 835432 HI0777 836970 837914 HI0789 843493 844095 HI0808 854572855375 HI0809 656603 855413 HI0812 860092 859214 HI0819 868114 667569HI0827 876702 877433 HI0828 677442 877996 HI0829 677999 878460 HI0833881059 881640 HI0839 887221 886541 HI0840 887844 687278 HI0841 888779887757 HI0842 888896 889111 HI0843 889116 890870 HI0844 891071 891898HI0845 891925 892059 HI0847 892866 893129 HI0849 893822 894164 HI0851895374 896144 HI0852 896141 896572 HI0853 896977 897510 HI0854 897510898898 HI0856 900867 901625 HI0857 902112 901768 HI0859 905068 905367HI0860 905688 906248 HI0862 909726 908989 HI0863 912130 909785 HI0864913029 912325 HI0866 915792 913945 HI0868 918419 918538 HI0871 920692921246 HI0872 921338 921439 HI0873 922696 923613 HI0876 927351 926155HI0880 931427 930509 HI0883 932310 933296 HI0884 933350 934084 HI0888938667 939068 HI0892 943690 944319 HI0893 944315 944518 HI0904 957295958086 HI0905 957488 957174 HI0908 959765 960283 HI0909 960628 960317HI0910 960708 961007 HI0914 966380 967141 HI0920 974685 973357 HI0922976298 975582 HI0927 983767 983405 HI0928 984057 983800 HI0931 988229987051 HI0932 988850 988233 HI0933 989308 988826 HI0935 991961 990760HI0936 993112 991961 HI0937 993639 993112 HI0938 995546 993642 HI0940996553 997110 HI0941 997170 997883 HI0942 997886 998566 HI0943 998544998846 HI0945 1002315 1002762 HI0950 1008217 1007987 HI0957 10132461013899 HI0958 1013924 1014091 HI0960 1016378 1015203 HI0961 10174261016374 HI0962 1017780 1017433 HI0963 1018172 1017783 HI0965 10220391021104 HI0966 1023606 1022077 HI0967 1023993 1024175 HI0968 10248431024944 HI0969 1024817 1024254 HI0976 1030609 1031712 HI0978 10339941034863 HI0979 1034868 1035440 HI0981 1036523 1037512 HI0986 10410671040252 HI0988 1042709 1044301 HI0990 1045642 1047047 HI0998 10616071062044 HI0999 1062363 1063049 HI1002 1063710 1063967 HI1003 10639701065592 HI1005 1067299 1067478 HI1006 1067384 1069165 HI1007 10692561070812 HI1009 1071385 1072338 HI1012 1073835 1074737 HI1013 10747431075981 HI1016 1077448 1078392 HI1018 1079890 1080315 HI1021 10821751083170 HI1022 1083178 1084791 HI1023 1084736 1085422 HI1026 10894661088792 HI1028 1091065 1090208 HI1029 1091066 1092597 HI1030 10935811092598 HI1031 1094889 1093615 HI1032 1095371 1094889 HI1033 10964411095446 HI1034 1096617 1097420 HI1036 1098535 1099023 HI1038 11002591100810 HI1039 1101878 1100997 HI1040 1102257 1103456 HI1041 11035351103386 HI1045 1108332 1107835 HI1046 1108943 1108335 HI1050 11131981114304 HI1055 1117984 1118322 HI1056 1119807 1118428 HI1057 11212391119698 HI1058 1123210 1123287 HI1060 1123449 1122868 HI1065 11270361126827 HI1066 1128454 1127000 HI1072 1135049 1133604 HI1073 11352341134995 HI1074 1137513 1135267 HI1075 1137884 1137513 HI1076 11383371137888 HI1084 1148702 1148448 HI1085 1149040 1148726 HI1086 11496951149054 HI1087 1150228 1149728 HI1088 1151024 1150242 HI1091 11531411153776 HI1092 1153784 1154446 HI1093 1154507 1155244 HI1094 11552891155489 HI1095 1155489 1156007 HI1096 1156007 1157950 HI1097 11580921158634 HI1098 1158637 1160013 HI1099 1160451 1160492 HI1100 11605011160632 HI1101 1160637 1160942 HI1103 1164060 1163077 HI1107 11668041168024 HI1121 1184774 1184115 HI1128 1191629 1192577 HI1129 11934611193234 HI1131 1195069 1195242 HI1132 1195447 1195899 HI1133 11959331196895 HI1149 1215838 1214972 HI1150 1216338 1215847 HI1151 12170661216344 HI1152 1217588 1217073 HI1153 1218198 1217572 HI1154 12187701218237 HI1156 1220425 1220961 HI1158 1223159 1222695 HI1165 12312431230773 HI1168 1235872 1236231 HI1171 1238778 1239119 HI1172 12397291239166 HI1176 1242916 1243383 HI1178 1244125 1244051 HI1179 12443601244142 HI1184 1248098 1247517 HI1185 1248305 1248859 HI1186 12489341249107 HI1193 1256974 1256552 HI1194 1257654 1257067 HI1195 12578101257950 HI1198 1260250 1261479 HI1201 1263689 1264309 HI1202 12643601265430 HI1205 1267550 1268050 HI1206 1270263 1268131 HI1208 12717511272191 HI1218 1282515 1283219 HI1219 1283219 1283904 HI1225 12917591292049 HI1226 1292052 1293239 HI1237 1306218 1306673 HI1238 13072991306835 HI1239 1308273 1307173 HI1243 1313696 1313037 HI1244 13137941314591 HI1246 1316522 1315827 HI1247 1317233 1316616 HI1249 13199111321851 HI1251 132SS06 1324541 HI1252 1326129 1325512 HI1253 13264541326756 HI1255 1327256 1328923 HI1256 1328946 1329326 HI1257 13293341330392 HI1258 1330618 1330839 HI1259 1330839 1331300 HI1260 13313001331470 HI1265 1339879 1339148 HI1268 1346269 1345733 HI1269 13467561346836 HI1270 1346624 1346241 HI1271 1346849 1347025 HI1272 13470221347135 HI1273 1347135 1347323 HI1276 1348650 1349453 HI1283 13564391356654 HI1284 1356655 1357185 HI1285 1358080 1358502 HI1289 13672271365851 HI1291 1369064 1369447 HI1292 1369450 1370385 HI1294 13724531371617 HI1295 1373365 1372583 HI1296 1373601 1373359 HI1297 13737351373532 HI1300 1375530 1375949 HI1301 1375971 1376663 HI1303 13782361380176 HI1304 1380896 1380210 HI1309 1384563 1385051 HI1312 13867551386510 HI1313 1386780 1387538 HI1317 1391445 1391927 HI1318 13920961392410 HI1319 1392802 1393383 HI1320 1393468 1394280 HI1326 14019701401527 HI1329 1404808 1405533 HI1330 1405533 1405667 HI1335 14090631408968 HI1336 1409263 1408968 HI1340 1412995 1414329 HI1341 14143911414882 HI1343 1416879 1415557 HI1344 1417617 1417009 HI1345 14181331419509 HI1352 1426116 1425637 HI1354 1428276 1427314 HI1358 14335351433996 HI1367 1450229 1449366 HI1369 1453591 1453010 HI1371 14587061455929 HI1372 1461329 1458813 HI1378 1469827 1470732 HI1379 14707381471610 HI1391 1481365 1481808 HI1394 1484556 1485554 HI1399 14923911492023 HI1400 1493035 1492616 HI1401 1493171 1493004 HI1404 14954471496052 HI1405 1496978 1496157 HI1407 1498433 1498230 HI1408 14990141498469 HI1409 1499166 1499050 HI1410 1500612 1499515 HI1411 15010291500676 HI1413 1503610 1504026 HI1414 1504094 1502787 HI1415 15052801504099 HI1417 1506471 1505953 HI1418 1506880 1506602 HI1419 15070671506795 HI1421 1507987 1507634 HI1422 1508392 1508327 HI1423 15090301508428 HI1424 1509352 1509648 HI1425 1509648 1509938 HI1426 15102501509975 HI1427 1510403 1510975 HI1428 1511264 1511545 HI1431 15137761514795 HI1432 1514998 1515831 HI1439 1521750 1522223 HI1440 15222241525568 HI1441 1525569 1525820 HI1443 1526752 1528626 HI1450 15333581533038 HI1454 1536172 1536492 HI1455 1536633 1536668 HI1456 15371501536566 HI1458 1538541 1537903 HI1460 1540315 1539812 HI1462 15411011541340 HI1468 1547394 1546060 HI1474 1554422 1554078 HI1477 15572411556189 HI1481 1560071 1559355 HI1482 1560378 1560563 HI1484 15627201562989 HI1486 1563395 1562928 HI1487 1564353 1564667 HI1489 15651911565349 HI1490 1565824 1566042 HI1491 1566045 1566215 HI1492 15662211566778 HI1494 1567509 1568060 HI1495 1568255 1568467 HI1497 15686971569200 HI1498 1569285 1569566 HI1500 1569836 1570093 HI1501 15700931570344 HI1502 1570465 1570689 HI1503 1570599 1571015 HI1504 15713431571909 HI1505 1571912 1573435 HI1506 1573450 1575009 HI1507 15751031576344 HI1510 1578223 1579146 HI1511 1579232 1579486 HI1512 15795011579614 HI1513 1579620 1580042 HI1514 1580012 1580593 HI1515 15806091580797 HI1516 1580800 1582260 HI1517 1582273 1582626 HI1518 15826421583022 HI1519 1583106 1584998 HI1520 1584526 1584371 HI1523 15873161587624 HI1524 1587664 1588209 HI1525 1588221 1588625 HI1526 15886281589692 HI1527 1589781 1590284 HI1528 1590287 1592155 HI1529 15927721593659 HI1530 1593826 1593975 HI1540 1605903 1606442 HI1541 16064261607595 HI1542 1607568 1607912 HI1548 1613326 1613877 HI1549 16144821613931 HI1551 1616455 1615214 HI1552 1616740 1617159 HI1554 16198071618560 HI1558 1622639 1621995 HI1561 1626292 1625114 HI1564 16289711628171 HI1566 1630319 1629852 HI1568 1631692 1631537 HI1569 16324811631948 HI1570 1632603 1632517 HI1572 1633105 1633257 HI1575 16368701636721 HI1576 1637376 1636870 HI1577 1637498 1637439 HI1586 16479221647857 HI1587 1648198 1648028 HI1588 1648605 1648189 HI1592 16547491653193 HI1596 1659183 1657846 HI1597 1659861 1659247 HI1599 16616051661453 HI1600 1662311 1661643 HI1601 1662648 1662328 HI1604 16657791664724 HI1605 1666807 1666094 HI1606 1667750 1666800 HI1607 16680671667783 HI1608 1668561 1668800 HI1609 1668769 1669446 HI1611 16708021671410 HI1613 1672733 1673359 HI1614 1673350 1674312 HI1618 16788551677464 HI1626 1686816 1686316 HI1627 1687436 1686819 HI1628 16879211687439 HI1630 1688617 1687937 HI1631 1689671 1689177 HI1632 16905001690847 HI1633 1690388 1689675 HI1634 1690881 1691282 HI1637 16931111692542 HI1643 1702285 1700876 HI1649 1707768 1708781 HI1653 17119821712854 HI1654 1712909 1713433 HI1656 1715939 1716046 HI1657 17164421716167 HI1658 1717744 1717196 HI1659 1718225 1717860 HI1660 17202571719409 HI1661 1720329 1722053 HI1662 1722056 1722412 HI1663 17224281723010 HI1669 1732543 1731909 HI1670 1733332 1732556 HI1671 17334821733363 HI1672 1733919 1733539 HI1673 1735404 1733938 HI1675 17377111737589 HI1677 1738407 1739654 HI1678 1739641 1742283 HI1683 17450731745741 HI1685 1747304 1747843 HI1686 1750100 1747947 HI1687 17508331750171 HI1689 1752090 1753040 HI1690 1753041 1753619 HI1693 17571631757783 HI1694 1757788 1758492 HI1707 1770253 1770993 HI1709 17747571773684 HI1710 1775859 1774744 HI1715 1782227 1781865 HI1716 17824821782345 HI1720 1786560 1785523 HI1721 1786631 1787176 HI1723 17888421788747 HI1724 1789761 1788979 HI1726 1792471 1793034 HI1727 17932051793852 HI1729 1794860 1795201 HI1730 1795161 1795556 HI1736 18034071802481 HI1737 1804045 1803407 HI1742 1813528 1813298 HI1743 18139601813634 HI1744 1814691 1813960

[0273] TABLE 3 Whole Genome Sequencing Strategy. Stage DescriptionRandom small insert and large Randomly sheared genomic DNA insertlibrary construction on the order of 2 kb and 15-20 kb respectivelyLibrary Plating Verify random nature of library and maximize randomselection of small insert and large insert clones for templateproduction High-throughput DNA sequencing Sequence sufficient number ofsequence fragments from both ends for 6× coverage Assembly Assemblerandom sequence fragments and identify repeat regions Gap closure a.Physical gaps Order all contigs (finger- prints, peptide links, lambdaclones, PCR) and provide templates for closure b. Sequence gaps Completethe genome sequence by primer walking Editing Visual inspection andresolution of sequence ambiguities, including frameshifts AnnotationIdentification and description of all predicted coding regions (putativeidentifications, starts and stops, role assignments, operons, regulatoryregions)

[0274] TABLE 4 The theory of shotgun sequencing follows from theapplication of the equation for the Poisson distribution p_(x) =m^(x)e^(−m/x1) where x is the number of occurrences of an event and m isthe mean number of occurrences. The numbers below predict the assemblyof a 1.9 Mb genome with an average sequence fragment size of 460 bp. %bp Avg. Gap N unsequenced unsequenced DS Gaps Length 250 94.44 1794304236 7600 500 89.18 1694487 446 3800 1,000 79.54 1511204 795 1900 2,00063.26 1201967 1265 950 3,000 50.32 956009 1509 633 5,000 31.83 6047851592 380 10,000 10.13 192508 1013 190 15,000 3.23 61277 484 127 20,0001.03 19505 205 95 25,000 0.33 6209 82 76 30,000 0.10 1976 31 63 50,0000.00 20 1 38

[0275] TABLE 5 Summary of features of whole genome sequencing of H.influenzae Rd Description Number Double stranded templates 19,687Forward sequencing reactions (M13-21 primer) 19.346 # Successful (%)16.240(84%) Average edited read length 485 bp Reverse sequencingreactions (M13RP1 primer) 9297 # Successful (%) 7,744(83%) Averageedited read length 444 bp Sequence fragments in random assembly 24.304Total # of base pairs 11,631,485 # of contigs 140 Physical gap closure42 PCR 37 Southern analysis 15 Lambda clones 23 Peptide links 2Terminator sequencing reactions⁺ 3,102 # Successful (%) 2.024(65%)Average edited read length 375 bp Genome Size 1,830,121 bp # of N's insequence (%) 188(0.01%) Coordinates of proposed origin of replication602,483-602,764 G/C content 38% # of rRNA 6 rmA. rmC, rmD (spacerregion) 723 bp rmB, rmE, rmF (spacer region) 478 bp # of tRNA genesidentified 574 Number of Predicted Coding Regions 1,749 # Unassignedrole (%) 724(41%) No database match 384 Match hypothetical proteins 340# Assigned role (%) 1025(59%) Amino acid metabolism 71(6.9%) Fattyacid/phospholipid metabolism 24(2.3%) Biosynthesis of cofactors,prosthetic groups, 54(52%) and carriers Purines, pyrimidines,nucleosides, nucleotides 54(5.3%) Central intermediary metabolism31(3.0%) Energy metabolism 99(9.7%) Cell envelope 82(8.0%) Regulatoryfunctions 63(6.1%) Replication 88(8.6%) Transcription 27(2.5%)Translation 146(14.2%) Transport/binding proteins 145(14.1%) Cellularprocesses 42(4.1%) Other 99(9.7%)

[0276] TABLE 6 Two component systems in H. influenzae Rd ID LocationBest Match % ID % Sim Length (bp) Sensors: HI0221 239,378 arcB {E. coli}39.5 63.9 200 HI0269 299,541 narQ {E. coli} 38.1 68.0 562 HI17131,781,143 basS {E. coli} 27.7 51.5 250 HI1381 1,475,017 phoR {E. coli}38.1 61.6 280 Regulators: HI0728 777,934 narP {E. coli} 59.3 77.0 209HI0839 887,011 cpxR {E. coli} 51.9 73.0 229 HI0886 936,624 arcA {E.coli} 77.2 87.8 236 HI1382 1,475,502 phoB {E. coli} 52.9 71.4 228 HI17141,781,799 basR {E. coli} 43.5 59.3 219

[0277]

0 SEQUENCE LISTING The patent application contains a lengthy “SequenceListing” section. A copy of the “Sequence Listing” is available inelectronic form from the USPTO web site(http://seqdata.uspto.gov/sequence.html?DocID=20040018503). Anelectronic copy of the “Sequence Listing” will also be available fromthe USPTO upon request and payment of the fee set forth in 37 CFR1.19(b)(3).

What is claimed is:
 1. An isolated polynucleotide comprising a nucleicacid fragment of the Haemophilus influenzae Rd genome, wherein saidfragment consists of the nucleotide sequence of any one of the fragmentsof SEQ ID NO: 1 depicted in Table 1a or a degenerate variant thereof,excluding the fragments of SEQ ID NO: 1 depicted in Table 1b.
 2. Avector comprising the polynucleotide of claim
 1. 3. An organism whichhas been altered to contain the polynucleotide of claim
 1. 4. Anisolated polypeptide encoded by any one of the fragments of theHaemophilus influenzae Rd genome depicted in Table 1a or by a degeneratevariant of said fragment, excluding the fragments of SEQ ID NO:1depicted in Table 1b.
 5. An antibody which specifically binds thepolypeptide of claim
 4. 6. An isolated polynucleotide encoding thepolypeptide of claim
 4. 7. A vector comprising the polynucleotide ofclaim
 6. 8. An organism which has been altered to contain thepolynucleotide of claim
 7. 9. A method for producing a polypeptide in ahost cell comprising the steps of: (a) incubating a host containing aheterologous nucleic acid molecule whose nucleotide sequence consists ofthe polynucleotide of claim 6 under conditions where said heterologousnucleic acid molecule is expressed to produce said polypeptide, and (b)isolating said polypeptide.